Easy setup scaling out of your graph-node

tumf

tumf

Posted on May 2, 2022

Easy setup scaling out of your graph-node

Japanese version

If, for some reason, you are building your own graph-node without using a hosted-service such as The Graph, you must also correctly consider the scale out of graph-nodes. If you do not build a configuration that allows scale out, users will often end up with a 504 (Gateway Timeout) request from a Web3 client. This article introduces an easy method of graph-node scale out.

TL;DR

Rewrite official docker-compose.yml as follows (explanation follows).

version: '3'
services:
  graph-node-index:
    image: graphprotocol/graph-node
    ports:
      - '8020:8020'
    depends_on:
      - ipfs
      - postgres
    extra_hosts:
      - host.docker.internal:host-gateway
    environment:
      postgres_host: postgres
      postgres_user: graph-node
      postgres_pass: let-me-in
      postgres_db: graph-node
      ipfs: 'ipfs:5001'
      ethereum: 'mainnet:http://host.docker.internal:8545'
      GRAPH_LOG: info
      node_role: index-node
      node_id: index-node
      BLOCK_INGESTOR: index-node

  graph-node-query:
    image: graphprotocol/graph-node
    ports:
      - '8000:8000'
      - '8001:8001'
    depends_on:
      - ipfs
      - postgres
    extra_hosts:
      - host.docker.internal:host-gateway
    environment:
      postgres_host: postgres
      postgres_user: graph-node
      postgres_pass: let-me-in
      postgres_db: graph-node
      ipfs: 'ipfs:5001'
      ethereum: 'mainnet:http://host.docker.internal:8545'
      GRAPH_LOG: info
      node_role: query-node
  ipfs:
    image: ipfs/go-ipfs:v0.4.23
    ports:
      - '5001:5001'
    volumes:
      - ./data/ipfs:/data/ipfs
  postgres:
    image: postgres
    ports:
      - '5432:5432'
    command:
      [
        "postgres",
        "-cshared_preload_libraries=pg_stat_statements",
        "-cmax_connections=100"
      ]
    environment:
      POSTGRES_USER: graph-node
      POSTGRES_PASSWORD: let-me-in
      POSTGRES_DB: graph-node
    volumes:
      - ./data/postgres:/var/lib/postgresql/data
Enter fullscreen mode Exit fullscreen mode

Then run the node as follows:

docker-compose up -d --scale graph-node-query=5
Enter fullscreen mode Exit fullscreen mode

The following is an explanation

Scale out graph nodes

I am modifying docker-compose configuration to be able to scale out graph nodes. There are two main things I am doing

  • Split graph nodes into two index node and query node
  • Increase the number of concurrent PostgreSQL connections

Increase the number of concurrent PostgreSQL connections

Let's start with the easy one. Add a startup option to increase the number of concurrent PostgreSQL connections.

@@ -34,7 +53,8 @@
     command:
       [
         "postgres",
- "-cshared_preload_libraries=pg_stat_statements"
+ "-cshared_preload_libraries=pg_stat_statements",
+ "-cmax_connections=100"
       ]
     environment:
       POSTGRES_USER: graph-node
Enter fullscreen mode Exit fullscreen mode

Since we are increasing the number of graph nodes for scale-out, we also need to increase the number of connections waiting on the PostgreSQL side. Here I set it to 100.

Split graph-node into two roles: index node and query node

Next, graph nodes are divided into two roles: index-only nodes and query-only nodes. Scale out by increasing the number of query-only nodes. If scale-out is performed without this role separation, there will be multiple index nodes, which will compete for indexing work. Another approach is to use unique node_id, but I did not take this approach this time (see reason in postscript).

Index and query nodes are switched by the DISABLE_BLOCK_INGESTOR environment variable at graph node startup. If DISABLE_BLOCK_INGESTOR=false, it becomes a query node, and if true, it becomes an index node. This is a bit complicated, but it is switched in the start script inside the Dockerfile by node_role.

graph-node-index:
    environment:
      node_role: index-node
      node_id: index-node
      BLOCK_INGESTOR: index-node
Enter fullscreen mode Exit fullscreen mode
  graph-node-query:
    environment:
      node_role: query-node
Enter fullscreen mode Exit fullscreen mode

This alone will cause host-side port conflicts, so to prevent host-side port conflicts, the index node (graph-node-index) listens only to the API port (8020), and the query node (graph-node-query) listens only to HTTP (8000) and Websocket ( 8001) only.

  graph-node-index:
    image: graphprotocol/graph-node
    ports:
      # - '8000:8000'
      # - '8001:8001'
      - '8020:8020'
      #- '8030:8030'
      #- '8040:8040'
Enter fullscreen mode Exit fullscreen mode
  graph-node-query:
    image: graphprotocol/graph-node
    ports:
      - '8000:8000'
      - '8001:8001'
      #- '8020:8020'
      #- '8030:8030'
      #- '8040:8040'
Enter fullscreen mode Exit fullscreen mode

With this in place, scale out the query node and you're good to go.

docker-compose up -d --scale graph-node-query=5
Enter fullscreen mode Exit fullscreen mode

Yey, Good Job!!!


(Postscript) Why didn't I avoid the conflict by making the node_id unique?

My first approach was to make node_id unique. However, I changed my approach for the following two reasons

Hiding API node endpoints

I wanted to completely isolate the API node from query nodes accessed by the general public in order to reduce the chance of strangers accessing the API without permission.

Wanna using the official Dockerfile as is.

It would be easiest to set the node_id to $HOSTNAME to make it unique, but this would require modifying the start script, which is simply annoying.

💖 💪 🙅 🚩
tumf
tumf

Posted on May 2, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related