Loose Typings in GraphQL for Dynamic Grouping and Aggregation Queries

sebastian_scholl

Seb Scholl

Posted on January 7, 2020

Loose Typings in GraphQL for Dynamic Grouping and Aggregation Queries

Group... of birds!

Several months ago a developer building on 8base reached out via Intercom with, what ended up being, a roadmap changing message. It read:

...I'm building a marketing campaign management system that's integrated with my eCommerce business. I've gotten the whole thing working, but feel stuck when it comes to generating good reports. Is there any way I can do aggregations through the GraphQL API like I can when using SQL?

The 8base team had a few cursory ideas on how to approach this problem. However, having to work within the confines of the GraphQL specification invalidated a number of the initially suggested approaches. That said, we felt that this wasn't the last time a developer would wish they had such a feature. So as a result, "Groupings and Aggregations" was added to the top of our development team's to-do list.

Before taking some time to talk about how we were able to achieve groupings and aggregations within the confines of GraphQL, let's first check out these new API features themselves. We'll use a simplified data model that's similar to the one the developer who reached out was using.

The Data Model

Every order being made on the eCommerce site triggered a webhook that created an Invoice record in 8base. Additionally, Invoices are associated with a Campaign record when applicable (within the campaign's start and end dates). Thus we have our two tables with the following fields defined.

Campaigns
• Name: Text
• StartDate: Date
• EndDate: Date
• Discount: Number
• Invoices: Table (has-many relation)

Invoices
• OrderName: Text
• OrderDate: Date
• OrderPrice: Number
• Campaign: Table (belongs-to relation)

Simple enough! Right? Now with this data model defined, we can start writing some pretty useful queries - whether they're for reports, chart-data, or whatever else.

Comparing Campaign Performance

We filled the database with about 600 fake invoices and several campaigns. Using grouping and aggregations, we're going to run a query that gives us the following information back.

  1. The ID, Name, and Creator's email of each campaign.
  2. The number of Invoices attributed to each campaign.
  3. The average sale price and total revenue generated by each campaign.
  4. The order number of every Invoice in each group. ‍ In order to accomplish this, we're going to use the following query. Pay attention to the inline comments, as well as notice the relationships between "as" declarations and the GraphQL response object keys that are being set to type values.
query {
  invoicesList(groupBy: {
    query: {
      # We'll be grouping by Campign ID 
      # while aliasing the Name field as well.
      campaign: {
        id: { as: "CampaignId" },
        name: { as: "CampaignName"},
        # Jump into the owner relationship and 
        # alias the email field.
        createdBy: {
          email: { as: "CampaignOwner" }
        }
      },
      # On a single field we can run multiple
      # aggregations. For example, both summing
      # and averaging all prices in each group.
      orderPrice: [{
        as: "AverageSalePrice",
        fn: { aggregate: AVG }
      }, {
        as: "TotalRevenue",
        fn: { aggregate: SUM }  
      }],
      # Using the _groups key, we can access the
      # list of records used in the grouping function.
      # This can would on any level/relationship
      _group: { as: "Invoices" }
    }
  }) {
    groups {
      # The idea of "Loose Typings" come into play in
      # that the API cannot know ahead of time what
      # value types an aggregated result will be. Therefore
      # the developer can declare them explicitly in the query.
      CampaignId: ID
      CampaignName: String
      CampaignOwner: String
      AverageSalePrice: Float
      TotalRevenue: Float
      # On the _group we can re-access the records used
      # in the group on the normal record list types.
      Invoices: InvoiceGroup {
        count
        items { orderNumber }
      }
    }
  }
}

While the syntax may look a little funky, a deceivingly powerful SQL statement gets generated in the background that fulfills the query. Notice how, for every value that we aliased using as in the groupBy query itself, we were able to use the name as a key in the response object while assigning a type on the fly - without it mattering whether the field or aggregate value, was on the primary or a related table!

‍Running this query, we get the following JSON response from the 8base GraphQL API.

{
  "data": {
    "invoicesList": {
      "groups": [
        {
          "CampaignId": null,
          "CampaignName": null,
          "CampaignOwner": null,
          "AverageSalePrice": 49.804094,
          "TotalRevenue": 17033,
          "Invoices": {
            "count": 342,
            "items": [
              { "orderNumber": "000099" },
              // 341 other orderNumbers
            ]
          }
        },
        {
          "CampaignId": "ck4mza2pt00hs07mh0jli1hj9",
          "CampaignName": "Summer Sale",
          "CampaignOwner": "melissa@candleshop.com",
          "AverageSalePrice": 50.686567,
          "TotalRevenue": 3396,
          "Invoices": {
            "count": 67,
            "items": [
              { "orderNumber": "000256"},
              // 66 other orderNumbers
            ]
          }
        },
        {
          "CampaignId": "ck4mz9g2r00ax07ju6v8a6oaj",
          "CampaignName": "November Madness",
          "CampaignOwner": "jenny@candleshop.com",
          "AverageSalePrice": 46.953125,
          "TotalRevenue": 3005,
          "Invoices": {
            "count": 64,
            "items": [
              { "orderNumber": "000461" },
              // 63 other orderNumbers
            ]
          }
        },
        {
          "CampaignId": "ck4mz8sgw009c07juevcgfcqm",
          "CampaignName": "Christmas Blowout",
          "CampaignOwner": "james@candleshop.com",
          "AverageSalePrice": 50.128713,
          "TotalRevenue": 5063,
          "Invoices": {
            "count": 101,
            "items": [
              { "orderNumber": "000509" },
              // 100 other orderNumbers
            ]
          }
        }
      ]
    }
  }
}


Now, this is already pretty awesome. However, we can still take it a step further. Let's add some new, and totally believable, requirements to our specification.

  1. Only return groups belonging to a Campaign.
  2. Only return campaigns with an average sale price greater than or equal to $50.00 ‍ In order to achieve this, we're able to use a having clause. This allows us to use predicates that are specified as objects on declared types - using aggregated fields. For brevity's sake, I'll only post the new part of the query. However, know that it must come after the groupBy query argument.
query {
  invoicesList(groupBy: {
    query: { ...queryOptions },
    # Having let's us filter by an aggregated 
    # value using type specific predicates.
    having: {
      AND: [{
        alias: "CampaignName"
        string: { is_not_empty: true }
      },{
        alias: "TotalRevenue"
        float: { gte: 50.00 }
      }]
    }
  }) {
    groups { ...groupOptions }
  }
}

By specifying both the alias - which must match an alias in the query - and the field's expected type, we're able to declare a predicate value that filters the results accordingly on an aggregated/grouped field. As a result, the following filtered response gets returned.

{
  "data": {
    "invoicesList": {
      "groups": [
        {
          "CampaignId": "ck4mza2pt00hs07mh0jli1hj9",
          "CampaignName": "Summer Sale",
          "CampaignOwner": "melissa@candleshop.com",
          "AverageSalePrice": 50.686567,
          "TotalRevenue": 3396,
          "Invoices": {
            "count": 67,
            "items": [
              { "orderNumber": "000256"},
              // 66 other orderNumbers
            ]
          }
        },
        {
          "CampaignId": "ck4mz8sgw009c07juevcgfcqm",
          "CampaignName": "Christmas Blowout",
          "CampaignOwner": "james@candleshop.com",
          "AverageSalePrice": 50.128713,
          "TotalRevenue": 5063,
          "Invoices": {
            "count": 101,
            "items": [
              { "orderNumber": "000509" },
              // 100 other orderNumbers
            ]
          }
        }
      ]
    }
  }
}

How it works?

It might be hard to appreciate these queries without being somewhat familiar with SQL. That said, each of these groupBy GraphQL queries generates a raw SQL statement that could make a lumber-jack cry. For example, our query above generates the following SQL statement.


SELECT GROUP_CONCAT(DISTINCT `MT`.`id`) AS `_grouping`, 
            AVG(`MT`.`orderPrice`) AS `AverageSalePrice`, 
            SUM(`MT`.`orderPrice`) AS `TotalRevenue`, 
            `campaign`.`id` AS `CampaignId`, 
            `campaign`.`name` AS `CampaignName`, 
            `campaign->createdBy`.`email` AS `CampaignOwner`, 
            GROUP_CONCAT(DISTINCT `MT`.`id`) AS `Invoices`
      FROM `8base_5d84e92c`.`Invoices` AS `MT`
    LEFT JOIN (`8base_5d84e92`.`CampaignInvoices` AS `campaign->CampaignInvoices` 
                INNER JOIN `8base_5d84e92`.`Campaigns` AS `campaign`
          ON `campaign->CampaignInvoices`.`campaignBId` = `campaign`.`id`  
                        AND `campaign`.`deletedAt` = 0)
          ON `MT`.`id` = `campaign->CampaignInvoices`.`invoiceAId`
          LEFT JOIN `8base_5d84e92`.`Users` AS `campaign->createdBy`
          ON `campaign`.`createdById` = `campaign->createdBy`.`id`
                        AND `campaign->createdBy`.`deletedAt` = 0
            WHERE `MT`.`deletedAt` = 0 
        GROUP BY `campaign`.`id`, `campaign`.`name`, `campaign->createdBy`.`email` 
        HAVING ((`CampaignName` IS NOT NULL AND `CampaignName` <> '') 
                AND `TotalRevenue` >= 50) 
        ORDER BY `_grouping` ASC LIMIT 0, 5000

With that in mind, essentially what's happening on the GraphQL side is a sophisticated utilization of aliases that allows for dynamic types and attribute-names to be used. On the 8base side, we attempt to auto-cast types in the response. However, this is completely unreliable since there is no way of knowing how a grouped or aggregated field will be returned.

‍This is why it is necessary to explicitly, and manually, specify types in the response object. That way, we're able to enrich the GraphQL schema on the fly to accommodate the dynamic key-value pairs and objects. In tandem with auto-generating a number of custom types ahead of time, such as the GroupByField and a <TableName>Group for every table in the 8base workspace, the grouping and aggregation functions having everything they need to run... and only tease the limits of the GraphQL spec!

{
  # Using the aliased name and explicitly declaring a type, we're
  # able to generate dynamically name object keys.
  groups {
    CampaignId: ID
    CampaignName: String
    CampaignOwner: String
    AverageSalePrice: Float
    TotalRevenue: Float
    # The <TableName>Group type is auto-generated to allow for
    # the aliased group to be accessed as a normal list of records.
    Invoices: InvoiceGroup {
      count
      items { orderNumber }
    }
  }
}

💖 💪 🙅 🚩
sebastian_scholl
Seb Scholl

Posted on January 7, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related