When I submitted a PR, a friend asked:
Why are you using “Repo.preload/3” and not “Query.preload/3?
And then the second one pitched in as he saw preload as the first function call in one of my longer queries:
Why are you preloading for all of the records?
It got me thinking about which solution is better. Let’s find out what the difference is between Ecto.Query.preload/3
and Ecto.Repo.preload/3
.
TL;DR
For those in a rush here you can get to know the essence of my findings.
When using either of preload/3
for it's default job to load associations for every record, like:
Demo.Blog.Post |> Ecto.Query.preload(:comments) |> Demo.Repo.all()
# or
Demo.Blog.Post |> Demo.Repo.all() |> Demo.Repo.preload(:comments)
It doesn’t matter which one we use. These generate the same queries. Looking at time efficiency, they are comparable.
Nevertheless, there are areas in which one is better than the other.
Like, Ecto.Query.preload/3
has the ability to limit the number of queries by using join
and Ecto.Repo.preload/3
can limit the number of records fetched by not loading again already present associations.
How did I get to these conclusions? Let’s read ahead and find out together.
hexdocs.pm
My first point of interest was documentation pages. I wanted to verify whether official documentation will provide some answers right away.
Ecto.Query.preload/3 \ Preloads the associations into the result set.
Ecto.Repo.preload/3 \ Preloads all associations on the given struct or structs. This is similar to
Ecto.Query.preload/3
\ except it allows you to preload structs after they have been fetched from the database. \ In case the association was already loaded, preload won't attempt to reload it.
After reading this, I felt a bit disappointed. These short descriptions provided only part of the information I was after. They covered what I have already known - you can preload associations in two different places:
- while fetching records,
- when they are already present.
I wasn’t satisfied with this. I wanted to know more, so I ditched documentation for now and went ahead to step number two.
The code
As described in the docs, both preload/3
work the same way fetching associations. They generate separate query (or queries, depending on how many associations we want to fetch) to get all needed records.
defmodule Demo.Blog.Posts do
use Ecto.Schema
import Ecto.Changeset
schema "posts" do
field :content, :string
has_many(:comments, Demo.Blog.Comments, foreign_key: :post_id)
timestamps()
end
end
defmodule Demo.Blog.Comments do
use Ecto.Schema
import Ecto.Changeset
schema "comments" do
field :content, :string
belongs_to(:post, Demo.Blog.Posts)
timestamps()
end
end
So I have created simple relations, put some records in DB and started querying to get real answers for questions that scratched at the back of my head.
Are queries generated by Query.preload/3
and Repo.preload/3
different?
# Demo.Blog.Posts |> preload(:comments) |> Demo.Repo.all()
SELECT p0."id", p0."content", p0."inserted_at", p0."updated_at" FROM "posts" AS p0 []
SELECT c0."id", c0."content", c0."post_id", c0."inserted_at", c0."updated_at", c0."post_id" FROM "comments" AS c0 WHERE (c0."post_id" = ANY($1)) ORDER BY c0."post_id" [[2, 1]]
# Demo.Blog.Posts |> Demo.Repo.all() |> Demo.Repo.preload(:comments)
SELECT p0."id", p0."content", p0."inserted_at", p0."updated_at" FROM "posts" AS p0 []
SELECT c0."id", c0."content", c0."post_id", c0."inserted_at", c0."updated_at", c0."post_id" FROM "comments" AS c0 WHERE (c0."post_id" = ANY($1)) ORDER BY c0."post_id" [[2, 1]]
Aaaand they look the same, not what I had expected, but good to know. Next question...
Does Query.preload/3
preloads for all associations?
# Demo.Blog.Posts |> where(content: "Post text etc") |> preload(:comments) |> Demo.Repo.all()
[debug] QUERY OK source="posts" db=4.0ms queue=6.2ms idle=1766.6ms
SELECT p0."id", p0."content", p0."inserted_at", p0."updated_at" FROM "posts" AS p0 WHERE (p0."content" = 'Post text etc') []
[debug] QUERY OK source="comments" db=1.1ms queue=2.0ms idle=1777.1ms
SELECT c0."id", c0."content", c0."post_id", c0."inserted_at", c0."updated_at", c0."post_id" FROM "comments" AS c0 WHERE (c0."post_id" = $1) ORDER BY c0."post_id" [1]
Yes! Finally, some answers I was looking for! When we preload from a query, we only do so for the records that were left after filtering. This is further confirmed by the docs (but in my opinion, it’s not said explicitly enough)
... fetch (...) from the database and then do a separate query returning all (...) associated ...
Can I use queries when preloading data?
Most queries that require preload use what I like to call a default form. Which is a passing list of atoms such as:
(...)
|> preload([:comments, :author])
(...)
or
(...)
|> preload([:comments, author: [:friends]])
(...)
Surprisingly you can also use queries inside preload statements to limit, order or filter preloaded records. This will still produce two queries, but you gain more control over associations and what is important here, both Ecto.Query
and Ecto.Repo
functions support this.
Some of use cases, for example, Post -> Comment
associations include:
limiting number of comments fetched for a single post
comments_query = Comments |> limit(5) Posts |> Ecto.Query.preload(comments_query) |> Repo.all()
ordering comments e.g. by date of creation
comments_query = Comments |> order_by(:inserted_at) Posts |> Repo.all() |> Repo.preload(comments_query)
What we do need to watch out for is that such queries are working on a whole subset of data we are fetching, that means that limit/2
in the example above will fetch and associate only 5 comments for all of the posts, not 5 comments for each post.
Is there a way to limit number of queries when preloading?
Short answer: yes, but only for Ecto.Query.preload/3
, we can use join
statements to select and filter our associations \
in the same query. The example below allows for fetching posts and associated comments with single query.
# Demo.Repo.all(
# from p in Demo.Blog.Posts,
# join: c in assoc(p, :comments),
# where: c.inserted_at > p.updated_at,
# preload: [comments: c]
# )
[debug] QUERY OK source="posts" db=13.6ms queue=2.0ms idle=578.7ms
SELECT p0."id", p0."content", p0."inserted_at", p0."updated_at", c1."id", c1."content", c1."post_id", c1."inserted_at", c1."updated_at" FROM "posts" AS p0 INNER JOIN "comments" AS c1 ON c1."post_id" = p0."id" WHERE (c1."inserted_at" > p0."updated_at") []````
So what is better? Right now, I can only answer: it depends. As in almost everything we do, it’s a case by case question we need to answer to achieve the best results.