The Myth of GraphQL

asafaeirad

ASafaeirad

Posted on October 15, 2024

The Myth of GraphQL

It's often said that GraphQL fixes the problems of under-fetching and over-fetching. But is that really the case? In theory, it sounds promising. In practice, however, you might be trading problems for a pile of new ones that even the most sophisticated frameworks struggle to solve.

The Temptation to Put Everything in a Single Request

Imagine you're at an all-you-can-eat buffet, and someone advises you,

Just load up your plate with everything you might possibly want in one go; it'll save you trips!

Sounds efficient, right? That's akin to what GraphQL suggests:

Pack as much data as you need into a single request to avoid under-fetching, and I'll let you specify exactly what you want to prevent over-fetching.

Let's follow the advice!
To achieve this in a React application, we might hoist our data-fetching logic up to the highest level and pass down this huge data object to our presentational components.

Here is an example data schema we get for a query using Hasura and GraphQL-Codegen

export type ProjectQuery = {
  __typename?: 'query_root',
  project_by_pk?: {
    __typename?: 'project',
    id: string,
    name: string,
    description?: string | null,
    status: SchemaTypes.ProjectStatusEnum,
    start_date?: string | null,
    due_date?: string | null,
    created_at: string,
    updated_at: string,
    households: Array<{
      __typename?: 'household_project',
      household: {
        __typename?: 'household',
        id: string,
        name: string,
        status: SchemaTypes.HouseholdStatusEnum,
        severity: SchemaTypes.HouseholdSeverityEnum,
        code?: string | null,
        created_at: string,
        updated_at: string,
        members_count?: number | null
    }}>
  } | null
};
Enter fullscreen mode Exit fullscreen mode

Now, instead of neat, modular components with their own data queries, we have a massive, monolithic object—with an ad-hoc schema full of random nulls.

Step 1: We've just sacrificed co-location! But on the bright side, we have presentational components instead 🎉

The Quest for a Meaningful Schema

You're right to say: "Why is the schema ad-hoc? it's a skill issue. Can't we make some meaningful entities?"
One approach is to create fragments like ProjectStatusFragment, HouseholdIdentityFragment, and HouseholdMembersFragment, and enforce their usage across the team.

But wait—do we need all the data behind these fragments every time? Fragments are meant to be reusable, but reusability can lead to over-fetching, which contradicts GraphQL's main promise:

Query exactly what you need on the client—no more, no less.

In the real world, use cases are infinite. To create meaningful fragments without overfetching, we'd need to create an infinite number of fragments. That's neither practical nor efficient. So we default back to flexible schemas, letting each use case decide what data it needs.

This leads us back to square one with a lesson:

Every abstraction layer and reusability introduces data over-fetching, contradicting GraphQL's core promise.

The Problem of Nulls

Why are there so many random nulls in our data? The answer lies in GraphQL's design decisions regarding nullability:

TL;DR

In GraphQL, every field and every type is nullable by default. ... By defaulting every field to nullable, any of field failure may result in just that field returning "null" rather than having a complete failure for the request.

This means our schemas are riddled with optional fields, leading to a data structure filled with nulls. It's not necessarily a bad design choice, but it's the reality when we work with GraphQL.

Returning to the Core Issue

Now, without any skill issues, we're stuck with a massive data with an ad-hoc and partial schema. We need to pass this data to our presentational components, but how?

Option 1: Prop Drilling

One option is prop drilling. But is it practical to pass such a data schema without losing our sanity? Not really.

Consider the purpose of presentational components: they are free of side effects, loosely coupled, and therefore reusable and easy to test. By passing down this enormous, loosely typed object, we're tightly coupling our components to a specific query structure.

type Props = {
  households: Array<{
      __typename?: 'household_project',
      household: {
        __typename?: 'household',
        id: string,
        name: string,
        status: SchemaTypes.HouseholdStatusEnum,
        severity: SchemaTypes.HouseholdSeverityEnum,
        code?: string | null,
        created_at: string,
        updated_at: string,
        members_count?: number | null
    }}>
  } | null
}

const HouseholdList = ({ households } : Props) => {}
Enter fullscreen mode Exit fullscreen mode

Tight dependency isn't just about what a component uses or imports. In software development, dependency means "What information is this part of the code aware of?" When a piece of code is aware of specific information, it becomes responsible for reacting whenever that information changes. This means our HouseholdList component isn't just using the data; it is coupled to the exact structure of our query results. As a result, any change in the query triggers a change in our component's high-level API.

Is it tightly coupled? Absolutely.
Is it easy to test? Not at all.

Presentational components aren't free. They depend on their parent components to handle responsibilities and side effects like data fetching. By shifting this responsibility away from the components themselves, we introduce duplication. Every time we reuse these components in different contexts, we have to replicate the same data-fetching logic in their parent components.

In this scenario, we get the worst of both worlds: we don't reap the benefits of presentational components, but we still pay the costs.

And let's not forget, our data is littered with nulls. The bigger question is should our components accept nullable values just because our I/O isn't reliable?

Here's the next lesson:

Passing a raw query result as props, couples our components to the unpredictability of I/O.

Searching for Meaningful Interfaces

To untangle this mess, we might try to create meaningful, decoupled interfaces. We'll map our unwieldy data to what each component needs, embracing abstraction.

But here's the kicker: good abstraction clashes with the just query what you need approach.

Why?

Let's attempt to create a Project entity and a mapper function:

type Project = {
  id: string;
  name: string;
  dueDate?: Date;
}

function toProject(data: X): Project { /* ... */ }
Enter fullscreen mode Exit fullscreen mode

But what is X? If we assume it's the generated Project type from our GraphQL schema, we're in trouble. Consider this query:

const { data } = useQuery(gql`{ projects { id, dueDate } }`);
Enter fullscreen mode Exit fullscreen mode

This data lacks the fields needed to map to our Project entity. We can't reliably map partial data to a full entity without risking runtime errors or inconsistent state.

Option 2: Using Context

Okay, maybe crafting meaningful interfaces is off the table, but we can prevent our component interfaces from getting polluted by avoiding prop drilling altogether. "Aha! We'll use React's Context API!" We set up a provider and pass our data through context!

const Page = () => {
  const query = usePageQuery();

  return (
    <MyProvider value={query}>
      <MyChildren />
    </MyProvider>
  );
}

const MyChildren = () => {
  const { data, loading , error } = use(MyProvider);
}
Enter fullscreen mode Exit fullscreen mode

But hold on—aren't we just coupling MyChildren to usePageQuery via context? It's not transparent as we are doing it via dependency injection but we'll get to that in a second. We have a bigger problem, since ApolloClient provides a cache with ApolloProvider, we're adding redundant layers here.

Simplifying our code, we might write:

const Page = () => {
  usePageQuery();
  return <MyChildren />;
};

const MyChildren = () => {
  const { data } = usePageQuery({ fetchPolicy: "cache-only" });
  // Component logic
};
Enter fullscreen mode Exit fullscreen mode

Now you see me! Context doesn't solve our fundamental problem; it just obscures it.

The Challenge of Render-As-You-Fetch

In many cases, we don't need all the data upfront to start rendering. When we combine everything into one huge request, we make it harder to render parts of our application as soon as their data arrives.

Yes, we can use directives like @defer, but implementing them adds layers of complexity to both the client and server.

Additionally, sometimes, we need different strategies for different data. For instance, we might want to render part of the data on the server and the rest on the client. In this case, we need to break our query into at least two separate queries. (Did I just miss dynamic and static data 🤔)

const Page = () => {
  const serverQuery = useServerPageQuery();
  const clientQuery = useClientPageQuery({ ssr: false });
  /* ... */
}

Enter fullscreen mode Exit fullscreen mode

Cache Invalidation: The Hidden Beast

When we mutate data, we need to update our cache. Sometimes, optimistic updates and manual cache manipulation aren't feasible. The safest route is often to refetch.

But with our all-in-one query, refetching means fetching the entire dataset again—a heavy, inefficient operation.

Is there a solution? Perhaps, but it would require sophisticated infrastructure that goes beyond what most app developers should implement. We're talking about systems that can intelligently manage partial cache invalidation.

The Cost of Chasing Zero Over-Fetching and Under-Fetching

Let's tally up the costs of striving for zero over-fetching and under-fetching:

  • Coupled Presentational Components
  • No Co-location
  • Low Signal-to-Noise Ratio: Massive generated types and null handling clutter our codebase.
  • Complex Render Strategies
  • Cache Management Nightmares:

thanos meme

Is it worth it?

A Reality Check

In practice, many teams abandon the ideal of crafting minimal, all-encompassing queries. Instead, they opt for smaller, reusable data-fetching hooks like useUser, useComments, and useWhatever. They also leverage fragments to promote reusability and define cohesive entities within their GraphQL schemas.

But wasn't GraphQL's main selling point that it's a query language for the client—allowing us to request data in exactly the shape we need? Yet, in practice, we're using it more like a simple SDK, making straightforward data requests. Aren't we just replicating what could be achieved with RPC or REST calls, but with added complexity?

And yes, I recognize that GraphQL isn't inherently bad—it does solve certain problems more effectively than other solutions. It offers flexibility, strong typing, and a unified interface for data fetching. However, as app developers, I believe it's time to rethink what we truly gain from using GraphQL before adopting it.

If you're a tech giant like Facebook, equipped to build and maintain the sophisticated frameworks required to harness GraphQL's full potential, then by all means, leverage it.

However, for most small to medium-sized enterprises, adopting GraphQL without the necessary resources leads to complexity and frustration. Based on my experience, it often results in a tangled mess rather than streamlined data management.

💖 💪 🙅 🚩
asafaeirad
ASafaeirad

Posted on October 15, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related