A Look To A Query Inside Postgres Source Code

mghrabi

Ahmed Hisham

Posted on July 11, 2023

A Look To A Query Inside Postgres Source Code

I would hope this blog to be a start of series where we debug PostgreSQL and see what's going on under the hood. In this blog specifically we will start by a very simple example where we stop at some point in the code using GDB debugger tool, and take a deeper look at how PostgreSQL represent the query and check a single property of it to make sure it's as expected, this will hopefully give us some confident debugging PostgreSQL.

Postgres Installation

You of don't already have PostgreSQl installed from source code you can follow this guide here and make sure you include the --enable-debug option. Almost any postgres version will be fine since what we will be debugging is almost untouched since the earliest versions of postgres. Also this blog assumes that you are already familiar with GDB, if not please follow any online tutorial to learn the basics as we won't need more than that :).

Understanding what we are going to debug

What we will debug exactly is a Query struct which is how postgres represent a query, it's defined in src/include/nodes/parsenodes.h in postgres codebase (don't be intimidated, you are not expected to know what each property or part of the code represents as postgres is very very huge codebase):

typedef struct Query
{
    NodeTag     type;

    CmdType     commandType;    /* select|insert|update|delete|merge|utility */

    /* where did I come from? */
    QuerySource querySource pg_node_attr(query_jumble_ignore);

    /*
     * query identifier (can be set by plugins); ignored for equal, as it
     * might not be set; also not stored.  This is the result of the query
     * jumble, hence ignored.
     */
    uint64      queryId pg_node_attr(equal_ignore, query_jumble_ignore, read_write_ignore, read_as(0));

    /* do I set the command result tag? */
    bool        canSetTag pg_node_attr(query_jumble_ignore);

    Node       *utilityStmt;    /* non-null if commandType == CMD_UTILITY */

    /*
     * rtable index of target relation for INSERT/UPDATE/DELETE/MERGE; 0 for
     * SELECT.  This is ignored in the query jumble as unrelated to the
     * compilation of the query ID.
     */
    int         resultRelation pg_node_attr(query_jumble_ignore);

    /* has aggregates in tlist or havingQual */
    bool        hasAggs pg_node_attr(query_jumble_ignore);
    /* has window functions in tlist */
    bool        hasWindowFuncs pg_node_attr(query_jumble_ignore);
    /* has set-returning functions in tlist */
    bool        hasTargetSRFs pg_node_attr(query_jumble_ignore);
    /* has subquery SubLink */
    bool        hasSubLinks pg_node_attr(query_jumble_ignore);
    /* distinctClause is from DISTINCT ON */
    bool        hasDistinctOn pg_node_attr(query_jumble_ignore);
    /* WITH RECURSIVE was specified */
    bool        hasRecursive pg_node_attr(query_jumble_ignore);
    /* has INSERT/UPDATE/DELETE in WITH */
    bool        hasModifyingCTE pg_node_attr(query_jumble_ignore);
    /* FOR [KEY] UPDATE/SHARE was specified */
    bool        hasForUpdate pg_node_attr(query_jumble_ignore);
    /* rewriter has applied some RLS policy */
    bool        hasRowSecurity pg_node_attr(query_jumble_ignore);
    /* is a RETURN statement */
    bool        isReturn pg_node_attr(query_jumble_ignore);

    List       *cteList;        /* WITH list (of CommonTableExpr's) */

    List       *rtable;         /* list of range table entries */

    /*
     * list of RTEPermissionInfo nodes for the rtable entries having
     * perminfoindex > 0
     */
    List       *rteperminfos pg_node_attr(query_jumble_ignore);
    FromExpr   *jointree;       /* table join tree (FROM and WHERE clauses);
                                 * also USING clause for MERGE */

    List       *mergeActionList;    /* list of actions for MERGE (only) */
    /* whether to use outer join */
    bool        mergeUseOuterJoin pg_node_attr(query_jumble_ignore);

    List       *targetList;     /* target list (of TargetEntry) */

    /* OVERRIDING clause */
    OverridingKind override pg_node_attr(query_jumble_ignore);

    OnConflictExpr *onConflict; /* ON CONFLICT DO [NOTHING | UPDATE] */

    List       *returningList;  /* return-values list (of TargetEntry) */

    List       *groupClause;    /* a list of SortGroupClause's */
    bool        groupDistinct;  /* is the group by clause distinct? */

    List       *groupingSets;   /* a list of GroupingSet's if present */

    Node       *havingQual;     /* qualifications applied to groups */

    List       *windowClause;   /* a list of WindowClause's */

    List       *distinctClause; /* a list of SortGroupClause's */

    List       *sortClause;     /* a list of SortGroupClause's */

    Node       *limitOffset;    /* # of result tuples to skip (int8 expr) */
    Node       *limitCount;     /* # of result tuples to return (int8 expr) */
    LimitOption limitOption;    /* limit type */

    List       *rowMarks;       /* a list of RowMarkClause's */

    Node       *setOperations;  /* set-operation tree if this is top level of
                                 * a UNION/INTERSECT/EXCEPT query */

    /*
     * A list of pg_constraint OIDs that the query depends on to be
     * semantically valid
     */
    List       *constraintDeps pg_node_attr(query_jumble_ignore);

    /* a list of WithCheckOption's (added during rewrite) */
    List       *withCheckOptions pg_node_attr(query_jumble_ignore);

    /*
     * The following two fields identify the portion of the source text string
     * containing this query.  They are typically only populated in top-level
     * Queries, not in sub-queries.  When not set, they might both be zero, or
     * both be -1 meaning "unknown".
     */
    /* start location, or -1 if unknown */
    int         stmt_location;
    /* length in bytes; 0 means "rest of string" */
    int         stmt_len pg_node_attr(query_jumble_ignore);
} Query;
Enter fullscreen mode Exit fullscreen mode

What we want to explore from this huge struct is exactly one property, to verify our predictions, the property is CmdType commandType; which as described is one from ** select|insert|update|delete|merge|utility **, fair enough.

Where to start?

Make sure you have a table created in the database you are running to be able to run sql queries against. Then attach GDB to the postgres process, to know the postgres process, run your postgres instance first then enter the following command:

testdb=# SELECT pg_backend_pid();
 pg_backend_pid
----------------
          17125
(1 row)
Enter fullscreen mode Exit fullscreen mode

copy the process id and then open another terminal and run:

gdb -p 17125
Enter fullscreen mode Exit fullscreen mode

Now we should be ready to debug.

Debugging planner function

One of the very important stages that a query goes through is the planner stage, it's the pre-execution phase in which postgres decide the optimum way to fetch data and which algorithm to use etc. We will inspect the planner function, but where can we find the planner function?, you can simply ask vscode to find it for you by type Ctrl + Shift + h (on windows), and search for planner( like the following:

Image description

we found it!, now open the terminal where you ran gdb, and create a breakpoint at the planner function:

b src/backend/optimizer/plan/planner.c:planner
Enter fullscreen mode Exit fullscreen mode

you should get an output similar to:

Breakpoint 1 at 0x560af507bcc0: file planner.c, line 269.
Enter fullscreen mode Exit fullscreen mode

which verifies the breakpoint is created.

now let's open postgres instance terminal and run a very simple SELECT query (but make sure you already created the users table to run the same command I'm running, but any other table should work):

testdb=# SELECT * FROM users;
Enter fullscreen mode Exit fullscreen mode

Go back to GDB terminal and press c and then enter, this will get us to where we created the breakpoint (at the planner function), you should get an output similar to the following:

Breakpoint 1, planner (parse=0x560af6cef470, query_string=0x560af6cee618 "SELECT * from users;", 
    cursorOptions=256, boundParams=0x0) at planner.c:269
warning: Source file is more recent than executable.
269      * so you'd better copy that data structure if you want to plan more than once.
(gdb) 
Enter fullscreen mode Exit fullscreen mode

You can notice that the argument parse is a pointer to a ** Query** which is what we want to inspect, print it out to make sure:

(gdb) print parse
$1 = (Query *) 0x560af6cef470
Enter fullscreen mode Exit fullscreen mode

mmmm right!, now we want to check the commandType variable, and according to the query we ran it should be SELECT:

(gdb) print parse->commandType
$2 = CMD_SELECT
Enter fullscreen mode Exit fullscreen mode

Everything is as expected. Now we should have gained a little experience with debugging postgers, and specially the Query struct, which you can go and play around with gdb to check other properties in it like the targetList variable.

I hope you found this blog helpful somehow.

💖 💪 🙅 🚩
mghrabi
Ahmed Hisham

Posted on July 11, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related