Migrating live data is risky. Let’s analyze the example below to see why.
Each year, triathlon races are held in many countries. Triathletes from around the world can sign up for one of 4 distances.
defmodule Triathlon.Race do
use Ecto.Schema
schema "triathlon_races" do
field :limit, :integer
field :distance, :string
# bunch of other fields
end
def sign_up(race_id, user_id) do
Repo.transaction(fn ->
race = fetch_exclusively(race_id)
case any_entry_available?(race) do
true -> add_participant(race, user_id)
false -> {:error, "All entries have been sold out."}
end
end)
end
defp fetch_exclusively(race_id) do
Race
|> where(id: ^race_id)
|> lock("FOR UPDATE")
|> Repo.one()
end
defp any_entry_available?(%Race{
id: race_id,
limit: limit
}) do
signed_up_number = fetch_participants_number(race_id)
limit > signed_up_number
end
end
defmodule Triathlon.Participant do
use Ecto.Migration
schema "triathlon_participants" do
field :user_id, :integer
field :race_id, :integer
# bunch of other fields
end
end
Notice that every time someone tries to register, we have to fetch the participant count. We also have to use a lock to prevent race conditions. This can be slow when many people are trying to register at the same time.
To solve that, we decide to add the entries_left field.
defmodule Migrations.AddCounterCacheOnTriathlonRaces do
use Ecto.Migration
def change do
alter table(:triathlon_races) do
add :entries_left, :integer
end
execute("""
UPDATE triathlon_races
SET entries_left = triathlon_races.limit - participants_subquery.number
FROM (
SELECT race_id, count(*) AS number
FROM triathlon_participants
GROUP BY race_id
) AS participants_subquery
WHERE triathlon_races.id = participants_subquery.race_id;
""")
end
end
While signing up for a race, we can check left entries based on the triathlon_races table. When a user successfully registers, the number of available entries decreases.
defmodule Triathlon.Race do
# bunch of other logic
def sign_up(race_id, user_id) do
Repo.transaction(fn ->
race = fetch_exclusively(race_id)
case any_entry_available?(race) do
true -> add_participant(race, user_id)
false -> {:error, "All entries have been sold out."}
end
end)
end
defp any_entry_available?(%Race{entries_left: entries_left}) do
entries_left > 0
end
defp add_participant(race, user_id) do
Race
|> Ecto.Changeset.change(%{
entries_left: race.entries_left - 1
})
|> Repo.update()
:ok = Participant.add(new_race, user_id)
end
end
We made the Race and Participant modules less coupled in terms of operational and temporal coupling. What’s more, it’s worth noting that available entries are the crucial part of the business logic. That’s why it’s good to keep them in the Race module.
What will happen if we deploy both the migration and the new code?
Once it is deployed to the production, our server runs the new code with the new migration which is run in a transaction. It means that until the entire process of the migration is finished, the changes entered by the migration are not visible by the system.
What if the table that we want to update contains thousands of records? This process can take several minutes then. Since the migration does UPDATE on the triathlonraces table and the fetchexclusively/1 function does LOCK FOR UPDATE, the registration process will wait until the lock can be acquired. It means that users won’t be able to register for a race at the time.
One way is to implement this process in four steps. Deploying each of them separately to production:
defmodule Triathlon.Race do
# bunch of other logic
def organize_race(%NewRace{} = cmd) do
%Race{
# bunch of other code
entries_left: cmd.entries_left
} |> Repo.insert!()
end
end
defmodule Triathlon.Race do
# bunch of other logic
def sign_up(race_id, user_id) do
Repo.transaction(fn ->
race = fetch_exclusively(race_id)
case any_entry_available?(race) do
true -> add_participant(race, user_id)
false -> {:error, "All entries have been sold out."}
end
end)
end
defp add_participant(race, user_id) do
if new_race?(race) do
Race
|> Ecto.Changeset.change(%{
entries_left: race.entries_left - 1
})
|> Repo.update()
end
:ok = Participant.add(new_race, user_id)
end
defp new_race?(%Race{entries_left: entries_left}) do
not is_nil(entries_left)
end
end
Without this step we assume that none would register between the process of updating the entries_left field (see step 3) and replacing the old code with the new one (see step 4).
If someone signed up in that specific moment, we would have a race with the entriesleft field equal nil. It means that anyentry_available?/1 function would return nil for this race, although the system wouldn’t know how many available entries there are.
Such a scenario silently destroys our data consistency, because a user can register for a race which has all entries sold out.
Seemingly simple migrations are dangerous and when executed incorrectly can lead to errors and downtimes. Each should be planned and carried out in such a way that the system is available all the time. What’s more, run migrations should be reversible. It means that, if there appears any errors, we should be able to rollback these changes, restoring system stability.