Collecting GraphQL Live Query Resource Identifier with GraphQL Tools

theguild_

TheGuildBot

Posted on July 29, 2021

Collecting GraphQL Live Query Resource Identifier with GraphQL Tools

This article was published on Thursday, July 1, 2021 by Laurin Quast @ The Guild Blog

Photo by Łukasz Nieścioruk on
Unsplash


Note: This article showcases technical details for implementing a Live Query engine with
graphql-tools. If you are new to Live Queries you might want to check out Subscriptions and Live
Queries - Real Time with GraphQL

first.

GraphQL live queries can solve real-time updates in a more elegant way than GraphQL subscriptions.

Instead of subscribing to events live queries primarily subscribe to data changes.

Instead of updating the client store manually a live query updates the client store magically
without redundant cache update logic on the client.

You can learn more about the differences here.

All those benefits, however, come with the drawback of the server having to become stateful, in
particular, being aware of all the data the client operation consumes and re-executing those query
operations for a specific client once the underlying data changes.

As I first started experimenting with GraphQL live queries the easiest solution was to simply
trigger live query re-executions based on the Query object type root fields. E.g. a query with a
selection set selection on the Query.viewer field could be re-executed by emitting the
Query.viewer event through the live query store event emitter. However, the viewer could be a
completely different record/resource for each client that consumes the given query operation.

To be more clear here is the corresponding schema:

type User {
  id: ID!
  login: String!
}

type Query {
  """
  Returns the authenticated user. Returns null in case the user is not authenticated.
  """
  viewer: User
  """
  List of the users that are currently online.
  """
  onlineUsers: [User!]!
}

type Mutation {
  updateLogin(newLogin: String!): Boolean!
}
Enter fullscreen mode Exit fullscreen mode
query viewer @live {
  viewer {
    id
    login
  }
}
Enter fullscreen mode Exit fullscreen mode

Let's see how the implementation for this could look like:

const Query = {
  viewer: (source, args, context) => {
    return context.viewer
  }
}

const Mutation = {
  updateLogin: async (source, args, context) => {
    await context.db.updateUser(context.viewer.id, args.newLogin)

    context.liveQueryStore.triggerUpdate(`Query.viewer`)
    return true
  }
}
Enter fullscreen mode Exit fullscreen mode

If a specific user updates his login we shouldn't invalidate and re-execute any live query operation
that has a viewer selection set for any connected user who might not even be affected by that
change!

All queries are invalidated

At the same time, the user could also be referenced in another operation e.g. a list of all
available users (Query.onlineUsers). The Query.viewer event would not cover and schedule a
re-execution for operations that select the user via that field.

There Must Be a Better Solution for Uniquely Identifying the Selection Set Data

As you probably noticed the user has an id field of the ID! (nonnull id) type. This is a
commonly used field for uniquely identifying a resource on the client-side. Apollo-client uses the
__typename field in combination with the id field as the default resource cache key (User:1),
Relay goes a step further and already assumes that the resource type is already encoded (e.g.
base64("User:1") Note:
You are not forced to use base64 🤔) inside the id and
therefore only uses the id field.

What if we could also use such an identifier on the server-side in our live query store
implementation?

My current implementation just traversed the AST of the query operation and extracted the
schema coordinates on the root query type. E.g.
Query.viewer for the viewer live query operation from above.

Live Query Store with schema coordinates on operation context

However, in case we would want to identify the user via the id we must also add something like
User:1 to the set of resources the live query operation selects. This requires schema knowledge as
the live query store needs to know which type has an id field and if included in the selection set,
gather the corresponding resource identifier.

Live Query Store with resource identifiers and schema coordinates on operation context

As mentioned above this allows more granular query invalidations.

More granular updates are now possible

The first drawback I had in mind is that if an operation does not specify the id field on the
selection set, the resource cannot be tracked by the live query store.

However, most operations will probably select the id field as it is most likely used on the client
for the cache keys.

Furthermore, it could be possible to simply transform the query in such a way that the id field is
added to the selection set (similar to how apollo-client is by default adding a __typename
selection to each object type).

For keeping things simple I decided to push the responsibility for selecting the id field to the
client that sends the live query operation. I also could not find a use-case in my existing
application where there was no id selection for a resource 👍.

Implementing the Resource Identifier Collector

The next obstacle is to decide how the ids are extracted and I had two options in mind.

1. Traversing the GraphQL Execution Result Tree

This simply seemed complicated to me as I would need to traverse the whole result while somehow
guessing/checking the type of each leaf based on the operation AST and the schema. I quickly dropped
that idea.

2. Manually Register the Resource Identifier by Calling a Function That Is Injected via the Context

The goal of my live query store implementation is to add live query support to any schema with
minimal effort. Passing something along-side the context that a library user must call inside a
query resolver seemed wrong and all this should be an implementation detail the library user should
not care about.

Imagine if we had to register a resource manually in each resolver that returns an object type.

const Query = {
  viewer: (source, args, context) => {
    const viewer = context.viewer
    context.registerResource(`User:${viewer.id}`)
    return viewer
  }
}
Enter fullscreen mode Exit fullscreen mode

It might seem quite simple for a single resolver, however, it can quickly clutter and lead to bugs
if we have to manually do that for any resource in any resolver.

Ideally a library user will just have to add a context.liveQueryStore.triggerUpdate("User:1") line
to the updateLogin mutation field resolver in order to magically schedule an operation
re-execution, without the overhead of adding an additional function call to each resolver.

const Query = {
  viewer: (source, args, context) => {
    // No tracking registration code here.
    return context.viewer
  }
}

const Mutation = {
  updateLogin: async (source, args, context) => {
    await context.db.updateUser(context.viewer.id, args.newLogin)

    context.liveQueryStore.triggerUpdate(`User:${context.viewer.id}`)
    return true
  }
}
Enter fullscreen mode Exit fullscreen mode

So, I thought more about how this could be implemented in a less verbose way.

As any other field, the id field has a resolver (either the default resolver provided by GraphQL
or a user-defined resolver), so if there was a way to wrap each id field resolver with a function
that could solve the issue. The wrapper could call the actual resolver, register the resource, and
then return the value. The user won't have to care about anything (besides adding the id field to
the selection set of the query).

The best library for transforming and modifying GraphQL Schemas is
graphql-tools. Fortunately, it is now maintained by
The Guild, as apollo abandoned it and was maintained pretty poorly.

So I dug a bit into the fancy documentation and found @graphql-tools/wrap.

A quick excerpt from the documentation:

Schema wrapping is a method of making modified copies of GraphQLSchema objects, without changing
the original schema implementation.

Looks perfect, but since in this specific case I could potentially modify/map an existing schema, I
digged a bit further, chatted with @yaacovCR and learned about
mapSchema from @graphql-tools/utils. It maps an existing schema into a new schema while applying
some rules (kind of similar how Array.map() works 🤔).

Using mapSchema might be slightly faster as there will be no delegation to a second schema.

So the final plan is the following: We map the existing schema and warp the resolver fields append
the resource identifiers to a Set.

import {
  defaultFieldResolver,
  execute,
  GraphQLOutputType,
  GraphQLScalarType,
  GraphQLSchema,
  isNonNullType,
  isScalarType
} from 'graphql'
import { MapperKind, mapSchema } from '@graphql-tools/utils'

const ORIGINAL_CONTEXT_SYMBOL = Symbol('ORIGINAL_CONTEXT_SYMBOL')

const isNonNullIDScalarType = (type: GraphQLOutputType): type is GraphQLScalarType => {
  if (isNonNullType(type)) {
    return isScalarType(type.ofType) && type.ofType.name === 'ID'
  }
  return false
}

// invokes the callback with the resolved or sync input. Handy when you don't know whether the input is a Promise or the actual value you want.
export const runWith = <T>(input: T | Promise<T>, callback: (value: T) => void) => {
  if (input instanceof Promise) {
    input.then(callback, () => undefined)
  } else {
    callback(input)
  }
}

const addResourceIdentifierCollectorToSchema = (
  schema: GraphQLSchema,
  idFieldName: string
): GraphQLSchema =>
  mapSchema(schema, {
    [MapperKind.OBJECT_FIELD]: (fieldConfig, fieldName, typename) => {
      const newFieldConfig = { ...fieldConfig }

      let isIDField = fieldName === idFieldName && isNonNullIDScalarType(fieldConfig.type)
      let resolve = fieldConfig.resolve ?? defaultFieldResolver

      newFieldConfig.resolve = (src, args, context, info) => {
        if (!context || ORIGINAL_CONTEXT_SYMBOL in context === false) {
          return resolve(src, args, context, info)
        }

        const collectResourceIdentifier: ResourceIdentifierCollectorFunction =
          context.collectResourceIdentifier
        context = context[ORIGINAL_CONTEXT_SYMBOL]
        const result = resolve(src, args, context, info)

        if (isIDField) {
          runWith(result, (id: string) => collectResourceIdentifier({ typename, id }))
        }
        return result
      }

      return newFieldConfig
    }
  })
Enter fullscreen mode Exit fullscreen mode

The implementation for executing the operation is similar to the following:

const newIdentifier = new Set(rootFieldIdentifier)
const collectResourceIdentifier: ResourceGatherFunction = ({ typename, id }) =>
  // for a relay spec conform server the typename could even be omitted :)
  newIdentifier.add(`${typename}:${id}`)

// You definitely wanna cache the wrapped schema as you don't want to re-create it for each operation :)
const wrappedSchema = addResourceIdentifierCollectorToSchema(schema)

const result = execute({
  schema: wrappedSchema,
  document: operationDocument,
  operationName,
  rootValue,
  contextValue: {
    [ORIGINAL_CONTEXT_SYMBOL]: contextValue,
    collectResourceIdentifier
  },
  variableValues: operationVariables
})
Enter fullscreen mode Exit fullscreen mode

I had to wrap the "user" context in a context (context-ception 🤯) on which I also attached the
function for adding the resource identifier to the resource identifier set. I got inspired for this
by the apollo-server source code, as I knew it has a way for measuring resolver execution time,
which must be done on a request/operation basis similar to the resource identifier collection. This
method allows using a new function/context for each execution. Inside the field resolver, the
correct user context is then passed into the actual (user) field resolver.

Now after the operation has been executed against the schema the newIdentifier Set should contain
the identifiers of all the resources that were resolved during the operation execution.

The live query store can now use that information for re-executing queries once a resource
identifier event is emitted 👌.

Conclusion

Identifying resources and invalidating queries based on a resource basis rather than a query root
field basis allows more efficient query re-executions and can avoid pushing unnecessary updates to
clients.

GraphQL Tools is a super handy library that can be used for solving a huge variety of problems. I am
glad it got such a huge update and good documentation!

The implementation probably won't cover all use-cases. What if a client is not authenticated and the
Query.viewer resolver returns null. There is no User:ID string available on the live query
store operation context once the user has authenticated. Either a Query.viewer update must be
emitted through the live query store emitter (which will affect ANY client operation that selects
the viewer), the client must re-execute the operation after login or the live query store must
somehow be notified to re-execute all operations of the user that just authenticated.

In case you are interested in the source code for the implementation
check it out on GitHub

There is still more to discover and build in live query land!

We still need to manually notify the live query store that a resource must be invalidated. An
abstraction for doing this behind the scenes could vastly differ for different stacks.

Maybe the ORM/database store layer could emit the events or a proxy could emit those events based on
database operations such as INSERT, DELETE, and UPDATE.

Re-executing a query operation is nice and smart, but not the most efficient solution. What if we
could only re-execute certain resolvers? I already have some ideas in mind, and I will probably
write about that as well!

Check out this super cool talk about live queries @ Facebook!

Check out this super cool talk about live queries @ Samsara!

I also wrote an article about my Socket.io GraphQL Server Engine implementation!

💖 💪 🙅 🚩
theguild_
TheGuildBot

Posted on July 29, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related