Enforcing max query depth with Absinthe

Absinthe is a popular GraphQL library for Elixir. It’s the de-facto choice if you want to publish a GQL API. Vetspire uses it for its public API, where at peak we handle upward of 80 000 requests/minute.

As it’s up to the API clients to decide what they query, there is complexity and challenge to securing your API. Some strategies include:

Enforcing a timeout
Rate limiting
Complexity limits
Enforcing max query depth

You can find out more about these at How to GraphQL website. While you probably want to investigate all of these strategies and consider mix & matching all depending on your needs, this article will focus on the last one.

The problem

GraphQL schemas often are cyclic graphs (hence the name), allowing you to craft queries such as:

query IAmEvil {
  author(id: "abc") {
    posts {
      author {
        posts {
          author {
            posts {
              author {
                # that could go on as deep as the client wants!
              }
            }
          }
        }
      }
    }
  }
}

Even if your users are not malicious, they might be able to craft queries which are less than ideal. In our case, we found some of our API users reaching a depth of 10+, requesting a ton of data in the process, which turned out to be less efficient than if you were to split this query up.

Simple solution

Absinthe doesn’t seem to offer max query depth as a configuration option. The Safety Limits page in the documentation covers the ability to set complexity per field and enforce a limit as well as token limits.

Googling for it yielded a helpful StackOverflow post Absinthe Graphql nested queries security where Marcos Tapajós suggests:

You can write a middleware to check the selection_depth and block it. Something like that:

@impl Absinthe.Middleware
def call(res, _config) do
  IO.inspect(selection_depth(res.definition.selections))
  res
end
def selection_depth([], depth), do: depth + 1
def selection_depth(selections, depth \\ 0),
  do: selections |> Enum.map(&selection_depth(&1.selections, depth + 1)) |> Enum.max()

Let’s try it out! Here’s a middleware that utilises this logic:

defmodule AppWeb.Middleware.MaxQueryDepth do
  @moduledoc """
  Middleware that allows us to refuse queries that have a depth greater than a
  certain value.
  """
  @behaviour Absinthe.Middleware

  require Logger

  def call(resolution, _config) do
    selection_depth = selection_depth(resolution.definition.selections)

    max_query_depth = max_query_depth()

    if selection_depth > max_query_depth do
      Absinthe.Resolution.put_result(
        resolution,
        {:error,
         %{
           code: :query_depth_limit,
           message:
             "Query has depth of #{selection_depth}, which exceeds max depth of #{max_query_depth}."
         }}
      )
    else
      resolution
    end
  end

  def selection_depth(selections \\ [], depth \\ 0)
  def selection_depth([] = _selections, depth), do: depth + 1

  def selection_depth(selections, depth),
    do: selections |> Enum.map(&selection_depth(&1.selections, depth + 1)) |> Enum.max()

  defp max_query_depth, do: Application.get_env(:app_web, :max_query_depth)
end

Note that we configure max_query_depth via runtime.exs so that we can change the limit depending on environment.

We include this in our default middleware across the entire schema:

defmodule AppWeb.Schema do
  use Absinthe.Schema

  def middleware(middleware, _field, _object) do
    middleware ++ [AppWeb.Middleware.MaxQueryDepth]
  end
end

Testing

We should try it out - better yet, let’s write a test case for this middleware:

defmodule AppWeb.Middleware.MaxQueryDepthTest do
  use AppWeb.ConnCase, async: true

  @sample_query """
  query {
    org {
      id
      providers {
        id
        org {
          id
          providers {
            id
            org {
              id
            }
          }
        }
      }
    }
  }
  """

  describe "max_query_depth" do
    test "returns results when limit isn't reached", ctx do
      Application.put_env(:app_web, :max_query_depth, 50)
      assert %{"id" => _id, "providers" => _providers} = run_graphql!(ctx.conn, @sample_query)
    end

    test "returns an error when limit is reached", ctx do
      Application.put_env(:app_web, :max_query_depth, 5)

      assert %{
               "errors" => [
                 %{
                   "code" => "query_depth_limit",
                   "message" => "Query has depth of 5, which exceeds max depth of 4."
                 }
               ]
             } = run_graphql(ctx.conn, @sample_query)

      Application.put_env(:app_web, :max_query_depth, 50)
    end
  end
end

We have some utility functions that make this test simpler:

our default context has a conn set up
we have utility run_graphql function that boils down to this:

conn
|> post("/graphql", query: query)
|> json_response(200)

In any case, we just proved the code works! Thanks Marcus, even if your answer is from 2018, it still holds up!

Handling fragments

Unfortunately, upon further testing, I found an issue where if you utilise GraphQL fragments, you can fool the logic we have implemented. Let’s write a test for it:

@query_with_fragments """
query getNotifications {
  notifications {
    ...NotificationBasics
  }
}

fragment NotificationBasics on Notification {
  id
  patient {
    ...PatientDetails
  }
}

fragment PatientDetails on Patient {
  id
  name
  client {
    ...ClientDetails
  }
}

fragment ClientDetails on Client {
  id
  name
  patients {
    name
    client {
      id
    }
  }
}
"""
test "query with fragments is forbidden when it exceeds the limit", ctx do
  Application.put_env(:app_web, :max_query_depth, 5)

  assert %{
            "errors" => [
              %{
                "code" => "query_depth_limit",
                "message" => "Query has depth of 6, which exceeds max depth of 5."
              }
            ]
          } = run_graphql(ctx.conn, @query_with_fragments)

  Application.put_env(:app_web, :max_query_depth, 50)
end

This ends up causing an error in the implementation we have:

** (KeyError) key :selections not found in: %Absinthe.Blueprint.Document.Fragment.Spread{
  name: "NotificationBasics",
  directives: [],
  source_location: %Absinthe.Blueprint.SourceLocation{line: 3, column: 5},
  complexity: 8,
  flags: %{},
  errors: []
}

This is because our current selection_depth implementation assumes every selection has a selections key, but the first fragment we’re looking at looks like this:

%Absinthe.Blueprint.Document.Fragment.Spread{
  name: "NotificationBasics",
  directives: [],
  source_location: %Absinthe.Blueprint.SourceLocation{line: 3, column: 5},
  complexity: 8,
  flags: %{},
  errors: []
}

To resolve this, we can reach for resolution.fragments which has the definition of the fragments. It’s a map with every fragment; let’s take a look at our NotificationBasics - mind you, these structs are huge, so included here is simplified version:

%Absinthe.Blueprint.Document.Fragment.Named{
  name: "NotificationBasics",
  selections: [
    %Absinthe.Blueprint.Document.Field{
      name: "patient",
      selections: [
        %Absinthe.Blueprint.Document.Fragment.Spread{
          name: "PatientDetails",
          complexity: 7,
          errors: []
        }
      ],
    }
  ],
}

This tells us that this fragment has selections which include another fragment — which allows us to recursively look them all up till we get to the bottom of it.

Let’s change our selection_depth/ function to take the definition of the fragments, so we can look them up and let’s dive deep into selections till there’s nothing else:

def selection_depth(fragments \\ [], selections \\ [], depth \\ 0)
def selection_depth(_fragments, selections, depth) when selections == [], do: depth + 1

def selection_depth(fragments, selections, depth) do
  selections
  |> Enum.map(fn selection ->
    case selection do
      %Absinthe.Blueprint.Document.Fragment.Spread{} = fragment ->
        selections =
          fragments
          |> Enum.find(fn {name, _} -> name == fragment.name end)
          |> elem(1)
          |> Map.get(:selections)

        selection_depth(fragments, selections, depth)

      field ->
        selection_depth(fragments, field.selections, depth + 1)
    end
  end)
  |> Enum.max()
end

Let’s change how we call selection_depth/3:

selection_depth =
  selection_depth(resolution.fragments, resolution.definition.selections)

It might not be the prettiest code, but our test now passes!

Benchmarking

I didn’t realise this at first, but the middleware is called for every field in the query. There might be a way to do it once per query (please let me know your thoughts!), but I wanted to benchmark this code to ensure we’re not slowing it down. I figured it might be OK as we’re just performing quick lookups on maps, but I wanted to be sure.

We can use benchee to benchmark this:

In our case, we also used fun_with_flags to enable or disable the feature altogether, which allows us to easily test both versions:

Benchee.run(
  %{
    "with_max_query_depth" => {
      fn _input -> VetspireWeb.Support.TestUtils.run_graphql!(conn, sample_query) end,
      before_scenario: fn _input -> FunWithFlags.enable(:FeatureMaxQueryDepth) end,
      after_scenario: fn _input -> FunWithFlags.disable(:FeatureMaxQueryDepth) end
    },
    "without_max_query_depth" => fn ->
      VetspireWeb.Support.TestUtils.run_graphql!(conn, sample_query)
    end
  },
  time: 10,
  memory_time: 2
)

The results:

Name                              ips        average  deviation         median         99th %
with_max_query_depth            15.81       63.27 ms     ±6.35%       61.84 ms       79.90 ms
without_max_query_depth         15.71       63.67 ms     ±6.83%       62.18 ms       82.49 ms

Comparison:
with_max_query_depth            15.81
without_max_query_depth         15.71 - 1.01x slower +0.40 ms

Memory usage statistics:

Name                            average  deviation         median         99th %
with_max_query_depth            1.54 MB     ±0.01%        1.54 MB        1.54 MB
without_max_query_depth         1.54 MB     ±0.00%        1.54 MB        1.54 MB

Comparison:
with_max_query_depth            1.54 MB
without_max_query_depth         1.54 MB - 1.00x memory usage +0.00008 MB

All in all, I didn’t observe any performance difference!

Release process

When we released this to production almost a year ago, I’ve done it in two steps:

Release the code that analyses the query depth and logs queries that hit a pre-defined high threshold. We chose to log anything above depth of 5 - just to observe what our API users do and see what seems reasonable. We had a graph to show us frequency for every instance of the API call reaching this threshold (it helps to establish a minimum to minise logs/noise in metrics). This also gave us an idea of what is the maximum reasonable depth based on the real usage.
As mentioned, we released it feature flagged at first - to ensure we don’t observe a performance hit with a volume of users in production that would be hard to replicate in a benchmarking situation.

All in all, it’s been successful and the code has been in production for almost a year now for us.