Engineering

How atom keys with JSON in Ecto can break your system

In Elixir, you can use, among others, a string and an atom as a map key. Let's start with an atom as a map key:

%{:key => "value"}

As Elixir provides syntactic sugar for the maps with an atom key, it is usually written with colon syntax:

%{key: "value"}

You can also use string as a map key. Then it looks as follows:

%{"key" => "value"}

The most important difference between these two types of map keys is the way of accessing them.

To access maps with an atom as a key, we can use either map.key or map[:key] notation. The first one will raise a KeyError if the map doesn't contain the pointed key (in this case, it's :key). The second one, in such a case, will return nil.

To get the value from a map with a string as a key, you need to use the map["key"] notation. If the pointed key is not in the map, it will return nil.

The map.key syntax is used with maps that hold a predetermined set of atoms keys, and they are expected to be always present. As for the map[key] syntax, it's used for dynamically created maps that may have any key of any type.

Can the way of accessing cause any problem?

Bearing all that in mind, let's analyze the following scenario.

To get an account balance of a user, we need to fetch all their transactions.

defmodule Transaction do
  use Ecto.Schema

  import Ecto.Query

  schema "transactions" do
    field :user_id, :binary_id
    field :transaction_number, :integer
    field :title, :string
    field :amount, :float
  end

  def fetch(user_id) do
    __MODULE__
    |> where(user_id: user_id)
    |> order_by(asc: transaction_number)
    |> Repo.all()
  end
end

Based on these transactions, the system calculates an account balance.

defmodule Account do
  def balance(transactions) do
    Enum.reduce(transactions, 0, &(&2 + &1.amount))
  end
end

To avoid system performance problems, we decided to add a snapshot of transactions if the amount of them exceeds 10 thousand. What's more, whenever the balance/1 function reaches such a number, it saves a snapshot.

defmodule Transaction.Snapshot do
  use Ecto.Schema
  
  schema "transactions_snapshots" do
    field :user_id, :binary_id
    field :transaction, :map
    field :transaction_number, :integer
  end

  def save(user_id, balance, transaction_number) do
    %__MODULE__{
       user_id: user_id,
       transaction: %{title: "snapshot", amount: balance}
       transaction_number: transaction_number
    }
    |> Repo.insert!(
       on_conflict: :replace_all,
       conflict_target: user_id
    )
    
    :ok
  end

  def get(user_id) do
    Repo.get_by(__MODULE__, user_id: user_id) || default()
  end

  defp default(user_id) do
     %Snapshot{
       user_id: user_id,
       transaction: %{title: "initial snapshot", amount: 0},
       transaction_number: 0
     }
  end
end

defmodule Account do
  def balance(user_id, transactions) do
    balance = Enum.reduce(transactions, 0, &(&2 + &1.amount))
    
    transactions_number = length(transactions)

    if transactions_number >= 10_000 do
      :ok =
        Transaction.Snapshot.save(
          user_id,
          balance,
          transactions_number
        )
    end
    
    balance
  end
end

In order for the keeping snapshots to make sense, we also need to fetch the latest one whenever transactions are fetched.

defmodule Transaction do
  # ...

  def fetch(user_id) do
    %{
      transaction: transactions_snapshot,
      transactions_number: last_transaction_number
    } = Transaction.Snapshot.get(user_id)
    
     __MODULE__
    |> where(user_id: user_id)
    |> order_by(asc: transaction_number)
    |> where(transaction_number > last_transaction_number)
    |> Repo.all()
    |> add_snapshot(transactions_snapshot)
  end

  defp add_snapshot(transactions, transactions_snapshot) do
    [transactions_snapshot] ++ transactions
  end
end

From now on, we don't have to be worried about users who have millions of transactions. We just run Transaction.fetch/1 and it fetches no more than the last 10 thousand records. The rest of them are summed up in a snapshot. Now we can pass it to the Account.balance/2 function.

defmodule Account do
  def balance(user_id, transactions) do
    balance = Enum.reduce(transactions, 0, &(&2 + &1.amount))
    
    # save snapshot
    
    balance
  end
end

Are we sure that the function above will work if it gets transactions with a snapshot?

Let's take a look at our transactions variable.

[
  %{"title" => "snapshot", "amount" => 100},
  %Transaction{..., title: "...", amount: 200}
]

It contains maps, but each of them has different keys. The first one has strings as keys. The second one has atoms as keys. Due to that, the Account.balance/1 function crashes and returns:

(KeyError) key :amount not found in: %{"amount" => 100, "title" => "snapshot"}

Why? Because we access the map using the map.key notation and our snapshot cannot be accessed this way.

Why does the map from our snapshot have strings as keys, although we inserted into the database an atom key map?

def save(user_id, balance, transaction_number) do
  %__MODULE__{
    user_id: user_id,
    transaction: %{title: "snapshot", amount: balance}
    transaction_number: transaction_number
  }
  |> Repo.insert!(
    on_conflict: :replace_all,
    conflict_target: user_id
  )
    
  :ok
end

The database converts atom keys to string due to the fact that there is no "atom" type in PostgreSQL. That's why it's advised to use string key maps instead of atom key ones.

How can we prevent that?

Whenever we add a :map type field in our schema, we need to make sure that the code, which wants to access the map, uses the map["key"] notation.

In our case, the balance/2 function gets the list of transactions returned from the database, which contains atom key maps. Due to that, we cannot access them using the map["key"] notation.

The one way to fix that will be by modifying the get/1 function from the Snapshot module.

defmodule Transaction.Snapshot do
  # ...

  def get(user_id) do
    from_db() || default()
  end 

  defp from_db() do
    __MODULE__
    |> Repo.get_by(user_id: user_id)
    |> convert_keys_to_atoms()
  end

  defp convert_keys_to_atoms(nil), do: nil
  
  defp convert_keys_to_atoms(snapshot) do
    %{transaction: transaction} =  snapshot

    atom_key_map =
      Map.new(transaction, fn {k, v} ->
        {String.to_existing_atom(k), v}
      end)
 
    Map.put(snapshot, :transaction, atom_key_map)
  end

  # ...
end

Now each snapshot returned from the database will have a map with an atom key, so the Account.balance/2 function won’t have problems accessing it anymore.