Join Data Across APIs

slickstef11

Stefan 🚀

Posted on January 11, 2022

Join Data Across APIs

One way to think about APIs is to see them as lego blocks. They might be (Micro-) Services within your company or an API from a third party, but in the end, it's just lego blocks to solve specific problems.

The number of lego blocks being created is constantly growing, which leads to a few problems.

How do you make all your lego blocks easily accessible? How should you store them? How do you categorize them? Additionally, how can you combine data from different blocks (APIs) in an easy way?

In the end, all you want to do is build a small house with your blocks, just like my son. But there's a catch. APIs are not "normed" lego blocks. They don't easily fit together, yet!

During the last year, we've been working on a protocol and execution engine to "normalize" APIs. Using the WunderGraph framework, you can transform any API (GraphQL, REST, PostgreSQL, MySQL, ...) into a "normed" lego block.

Additionally, we've recently announced the closed Beta of the WunderHub, a place to share your normed lego blocks.

This means, it's about time to solve the third problem, JOINing data across APIs! This is what we're going to talk about in this post.

WunderGraph allows you to join data from different APIs from within a GraphQL Query. You don't need to write any logic, resolvers or create a custom schema. Just Query the data you need and join it across different APIs.

Before we dive into our solution, let's explore other options to join data across APIs.

You can join data in the client or on the server using custom integration logic. You could do the join in your database. Finally, we'll cover Apollo Federation and Schema stitching.

To make this post a bit more applicable, we're using an example scenario where we use two GraphQL APIs and join them together: The first returns the capital of a country, the second returns the weather for the city. Combined, we get the capital of the country and the weather for the city so we can decide where to go for our next trip.

Client-Side Application-Level Joins

First, you need a GraphQL client that allows multi tenancy. That is, many GraphQL clients are designed to work with a single GraphQL API.

Then we define the two Queries, one to fetch the capital, the other to get the weather data. From the result of the first Query, we use the name of the capital to fetch the weather data.

Finally, we combine the two results and get our desired result.

The solution is simple and doesn't require any additional backend. You can deploy the application to a CDN almost for free.

On the downside, some GraphQL clients struggle to talk to multiple APIs. If you want type safety, you need to generate types for two schemas. It's not an ideal solution to use multiple GraphQL APIs in a single client application.

Another issue can be the added latency for N+1 joins. Joining a single country with its weather might be fast, but what if we have to join 60 capitals? We'd have to make a lot of round trips which could take a long time, it's probably not the best user experience.

Server-Side Application-Level Joins

Another solution would be to move this logic to the server. Instead of using multiple GraphQL clients in our client application, we move them to our backend and expose the whole operation as a REST API.

The logic is the same as above, but moving it to the server introduces a few advantages but also drawbacks.

First, the client gets a lot simpler. It makes a single REST API call to fetch the data. There's no client needed, you can just use "fetch" from the browser.

However, we now have to run a backend to fetch the data and combine it. So we need to figure out a backend stack and need to decide how and where to deploy it. You also can't just put your backend on a CDN, so this solution will cost you something.

You might use a third-party service like AWS Lambda or Google Cloud Functions, but even then, you have to write the code, maintain it, deploy it, etc...

To summarize, the server-side solution is a bit more complex, but this complexity also comes with a few advantages.

For example, it's not possible to Cache the response across client requests and even use single-flight to only fetch the weather once, even if multiple clients request the same country.

Database Joins

Another way of joining data, probably the most widely known, is to use a database. Although a database join is not really suitable to combine the responses of APIs, it's still worth mentioning here.

PostgreSQL for example, has the concept of Foreign Data Wrappers (FDW). There are ways to use a FDW to join a table to another database or even using an HTTP call.

There might be use cases where FDW is suitable, but in general we'd advise against it. Ideally, we keep business logic out of the database and move it into a middleware or the client.

Apollo Federation

Another solution to join data from multiple APIs is to use Apollo Federation. Apollo Federation allows you to define the composition of multiple GraphQL (Micro-)Services from within the GraphQL Schema.

The idea of Federation is to have "one single GraphQL Schema" across the whole organization. An API Gateway that supports Federation will then distribute the requests to the different services.

WunderGraph doesn't just support Apollo Federation as a DataSource. We're also the only service capable of handling GraphQL Subscriptions for Federated APIs.

Federation is a great solution to build GraphQL Microservices at scale. That said, we've found that one single GraphQL Schema is not realistic in a real-world scenario.

Federation works great "within" a single Organization, but what about integrations across companies?

In a federated Graph, all services must be aware of each other. That is, all services must be able to contribute to the same GraphQL Schema, meaning that there needs to be communication between all shareholders of the Graph. Without this communication, the Graph might not "compile" because of naming conflicts or inconsistencies.

Within a single organization, it's already a challenge to scale a single Graph, but it's possible because you can force your own people to collaborate and communicate.

However, you cannot expect from other companies to respect your naming conventions. Ultimately, Federation is not a solution to build API relationships across boundaries that you don't own.

From our perspective, it's a great solution to build GraphQL Microservices using Federation, that's why we support it in WunderGraph, but it's only one of the many tools available to solve the problem.

Coming back to our above example, the two APIs unfortunately don't implement the Federation specification. In fact, no publicly known GraphQL API supports Federation because it's usually only used internally and then exposed as a single composed SuperGraph.

Schema Stitching

As we've learned before, Federation is not a solution to implement joins across organizations / Graphs.

Schema stitching, in contrast to Federation, is a centralized solution to facilitate JOINs across GraphQL APIs. While Federation encourages to share the JOIN configuration across all services that belong to a Graph, Schema stitching moves this logic into a single centralized service.

This means, services that are being stitched together don't actually know about each other. They are fully separated from each other and unaware that they're being stitched together.

This method allows for JOINs across organizations, even without any sort of communication between the stakeholders. The "stitch" service in this case is a centralized GraphQL server that decides how the final Graph will look like. If there are naming conflicts, the stitch service has to resolve them. The stitch service can also rename fields, add new fields, remove fields, and even change the type of a field.

Compared to the other solutions, it's a simple way to combine multiple GraphQL Services into a new GraphQL API without having to go the "hard way" of building a REST API on top.

The benefit is that the result is a valid GraphQL API that can be consumed by any GraphQL client. This benefit comes at the cost that this stitching services needs to be maintained and deployed. If you're scaling schema stitching, you might run into bottlenecks if too many people or teams contribute to a stitched service.

If you've got a small team and want to stitch your internal service with another API from a 3rd party, schema stitching might be an excellent solution.

The big drawback of schema stitching though is that you have to maintain another GraphQL Schema and the stitching definition. Tooling has improved recently to make this easier, but it can still be a challenge at scale.

WunderGraph: GraphQL Query Joins

We've looked at the GraphQL landscape for a while and observed how others have implemented JOINs. The most popular approaches have been discussed above.

Looking at these existing solutions, we've always felt that they add a lot of complexity. We wanted to find an easier way to JOIN data across APIs, so we've started experimenting.

For a long time, we thought that the solution needs to be to JOIN the APIs in the GraphQL Schema. This might sound obvious because it's the default way of thinking. When we talk about API design in GraphQL, we're talking about the GraphQL Schema.

But "integrating" APIs in the GraphQL Schema means complexity, we've been talking about the approaches above.

It took us a while, but we finally realized that with WunderGraph, you can actually JOIN APIs from within the GraphQL Operation. There's no need to use Federation or Stitching, just write a GraphQL Query with some small additions.

Why is this possible? It's possible because WunderGraph does one thing fundamentally different than all other GraphQL tools. WunderGraph is a Server-Side Only GraphQL Solution. We're not exposing a GraphQL API. Instead, we're compiling GraphQL Operations into JSON REST(ish) APIs and generate a typesafe client on top of that.

WunderGraph feels like you're using GraphQL, it looks like you're using GraphQL, but it's not. We're just using GraphQL as a "virtual Graph" to integrate the APIs and expose a REST API.

So, how does the solution look like?

First, we need to add the two APIs to our project:

// wundergraph.config.ts
const countries = introspect.graphql({
    apiNamespace: "countries",
    url: "https://countries.trevorblades.com/",
});

const weather = introspect.graphql({
    apiNamespace: "weather",
    url: "https://graphql-weather-api.herokuapp.com/",
});
Enter fullscreen mode Exit fullscreen mode

We introspect the two APIs and namespace them. If you want to learn more about namespacing and how it helps us to avoid naming conflicts, please check out the Namespacing Docs.

Now that we've got the two APIs added to our "virtual Graph", let's define our REST API by writing a GraphQL Query.

# Weather.graphql
query (
    $code: ID!
    $capital: String! @internal
){
    countries_country(code: $code){
        code
        name
        capital @export(as: "capital")
        currency
        _join {
            weather_getCityByName(name: $capital){
                weather {
                    summary {
                        title
                        description
                    }
                    temperature {
                        actual
                    }
                }
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Now run wunderctl up and you can use curl to Query your newly created API.

curl http://localhost:9991/api/main/Weather?code=DE
Enter fullscreen mode Exit fullscreen mode

Here's the response:

{
    "data": {
        "countries_country": {
            "code": "DE",
            "name": "Germany",
            "capital": "Berlin",
            "currency": "EUR",
            "_join": {
                "weather_getCityByName": {
                    "weather": {
                        "summary": {
                            "title": "Clouds",
                            "description": "broken clouds"
                        },
                        "temperature": {
                            "actual": 277.8
                        }
                    }
                }
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

What's going on here? Let's take a look at the Query.

First, we make a request to the Countries API and fetch the capital. We then "export" the name of the capital into an internal variable, simply a placeholder that is not exposed to the public API.

Then, we use the field _join which returns the Query type, allowing us to nest a second Query into the result of the first one. Finally, we use the $capital variable to pass the capital to the second Query and fetch the weather.

No stitching, no federation, just a simple GraphQL Query. If you want to learn more on how this works, have a look at the Docs on Cross API Joins.

So what are the benefits and drawbacks of this approach?

First, we don't have to write any code to integrate the APIs. We just need to write a GraphQL Query. This means, we don't have to learn Federation or Schema Stitching.

Second, we get a secured and optimized REST API with a typesafe client, authentication, authorization, caching and all the other benefits of WunderGraph.

This solution is actually almost the same as the "Server-Side Application-Level" approach above, just without writing any code.

Combined with the WunderHub and the Namespacing, this is actually what we wanted to achieve in the first place, turning APIs into simple re-usable lego blocks.

Okay, enough on the pros. Everything is a tradeoff and so is using WunderGraph.

In comparison to the first approach, we have to deploy the WunderGraph server (WunderNode) somewhere.

You have to learn and understand the newly introduced concepts, like @export, @internal and the _join field.

Another downside is the extra nesting because of the _join field. That's something we'd like to tackle in the future.

We also don't think that this Query-Joining approach is "better" than e.g. Apollo Federation or Schema Stitching. It's a different solution for a different situation.

Ideally, you'd use them together. Build your Microservices with Federation and Schema Stitching. Then bring everything together and expose it securely with WunderGraph.

What about PostgreSQL, MySQL, SQLite, SQL Server etc.?

WunderGraph is more than just another GraphQL Server, we've already got a wide array of connectors for different upstreams:

  1. GraphQL
  2. Apollo Federation
  3. REST / OpenAPI Specification
  4. PostgreSQL
  5. MySQL
  6. SQLite
  7. SQLServer
  8. Planetscale

This means, using the approach from above, you can easily JOIN data from different Database systems, like PostgreSQL and MySQL, combine them with a REST or GraphQL API, and expose them as a secure REST API with WunderGraph.

What's next

As we've explained, one of the issues with our approach is that the shape of the response could become a bit bloated due to the extra nesting. WunderGraph, being a Server-Side only GraphQL solution, we're able to adopt another approach that is forbidden for APIs that expose a GraphQL API directly.

We're looking at adopting some ideas from GraphQL lodash, a simple and easy way to modify the response of a GraphQL Query using directives.

WunderGraph exposes a REST API using JSON Schema as the language for describing the response. That's perfectly aligned with using the "lodash" of modifying the response. When applying a "lodash directive", we don't just modify the response, but also the JSON Schema for that operation. This means, the WunderGraph contract is still the same, we just add a "lodash Middleware" after we've resolved the response.

This should help us flatten the response and adds other interesting ways of modifying the response, e.g. calculating the max value of a field, aggregating a response or filtering.

Conclusion

You've learned about multiple ways of joining data from different APIs. We talked about the different ways of solving the problem, pros and cons of Federation and Schema Stitching, and when to use which one.

We've then introduced the concept of joining APIs in the Query, a novel approach by WunderGraph that is possible by doing Server-Side only GraphQL instead of exposing it to the client.

Interested in learning more about Wundergraph?

The best place to learn more about Wundergraph is in our Discord channel. You can join using this link.

💖 💪 🙅 🚩
slickstef11
Stefan 🚀

Posted on January 11, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Join Data Across APIs
javascript Join Data Across APIs

January 11, 2022

Versionless APIs
javascript Versionless APIs

September 30, 2021