Refactoring The Create Barbell Graph Function

On our last post for the source code tour of Apache AGE series, we implemented a function to generate a Barbell graph, and here is the result of it on AGE's GitHub repository.

But I thought it got a little hard to understand this function, as it has lots of steps just to create some vertexes and edges.

So let's refactor this function, implementing smaller helper functions in order to streamline the creation of vertexes and edges, therefore making it easier to read and get other kinds of pre-modeled graphs done.

First, let's put together in a struct every component we need to build nodes and edges:

typedef struct graph_components 
{
    Oid graph_oid;
    char* graph_name;
    int32 graph_size;

    char* vertex_label;
    int32 vertex_label_id;
    agtype* vertex_properties;
    Oid vtx_seq_id;

    char* edge_label;
    int32 edge_label_id;
    agtype* edge_properties;
    Oid edge_seq_id;

} graph_components;

There are too many error checks inside the function and it got messy, let's put it inside a function:

static void validate_barbell_function_args(PG_FUNCTION_ARGS)
{
    if (PG_ARGISNULL(0))
    {
        ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                        errmsg("Graph name cannot be NULL")));
    }
    if (PG_ARGISNULL(1) || PG_GETARG_INT32(1) < 3)
    {
        ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                errmsg("Graph size cannot be NULL or lower than 3")));
    }
    if (PG_ARGISNULL(2) || PG_GETARG_INT32(2) < 0 )
    {
        ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                errmsg("Bridge size cannot be NULL or lower than 0")));
    }
    if (PG_ARGISNULL(5))
    {
        ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                errmsg("edge label cannot be NULL")));
    }
    if (!PG_ARGISNULL(3) && !PG_ARGISNULL(5) &&
        strcmp(NameStr(*(PG_GETARG_NAME(3))), 
               NameStr(*(PG_GETARG_NAME(5)))) == 0)
    {
        ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                errmsg("vertex and edge labels cannot be the same")));
    }
}

Now we just call it with validate_barbell_function_args(fcinfo);.

fcinfo is the structure FunctionCallInfo referenced by the macro PG_FUNCTION_ARGS, which contains all the arguments passed to the function.

Our main function is looking like this:

PG_FUNCTION_INFO_V1(age_create_barbell_graph);

Datum age_create_barbell_graph(PG_FUNCTION_ARGS) 
{
    validate_barbell_function_args(fcinfo);

    PG_RETURN_VOID();
}

Now that we know the arguments are valid, let's process them and put them inside our graph_components struct, which we will initialize in the main function now, just calling it graph:

PG_FUNCTION_INFO_V1(age_create_barbell_graph);

Datum age_create_barbell_graph(PG_FUNCTION_ARGS) 
{
    struct graph_components graph;

    validate_barbell_function_args(fcinfo);
    process_arguments(fcinfo, &graph);

    PG_RETURN_VOID();
}

static void process_arguments(PG_FUNCTION_ARGS, graph_components* graph)
{
    graph->graph_name = NameStr(*(PG_GETARG_NAME(0)));
    graph->graph_size = PG_GETARG_INT32(1);

    if (PG_ARGISNULL(3))    graph->vertex_label = AG_DEFAULT_LABEL_VERTEX;
    else                    graph->vertex_label = NameStr(*(PG_GETARG_NAME(3)));

    if (PG_ARGISNULL(4))    graph->vertex_properties = create_empty_agtype();
    else                    graph->vertex_properties = (agtype*)(PG_GETARG_DATUM(4));

    if (PG_ARGISNULL(5))    graph->edge_label = AG_DEFAULT_LABEL_EDGE;
    else                    graph->edge_label = NameStr(*(PG_GETARG_NAME(5)));

    if (PG_ARGISNULL(6))    graph->edge_properties = create_empty_agtype();
    else                    graph->edge_properties = (agtype*)(PG_GETARG_DATUM(6));
}

Here we fetch the arguments with PG_GETARG with the type we expect and convert them to the type we need. For example, we need the graph_name as a char*, but we receive it as a Name, so we convert it directly with the command NameStr(*(PG_GETARG_NAME(0))) - note that 0 is because it's the first parameter.

Now we can build our two complete graphs, but first let's change the create_complete_graph function to make it return the graph_id of the first / last created node:

We just change the last line of the function from PG_RETURN_VOID(); to

PG_RETURN_DATUM(GRAPHID_GET_DATUM(end_vertex_graph_id));

And on the age--1.3.0.sql file, on create_complete_graph function, we change RETURN void to RETURN graphid:

CREATE FUNCTION ag_catalog.create_complete_graph(graph_name name, nodes int, edge_label name, node_label name = NULL)
RETURNS graphid   -- change here!!!
LANGUAGE c
CALLED ON NULL INPUT
PARALLEL SAFE
AS 'MODULE_PATHNAME';

Now back to graph_generation.c file, we create two variables to store a graphid of a node from each complete graph, so we can connect them with a bridge later, then we store the returning value of create_complete_graph in these variables:

Datum root1, root2;

// create two separate complete graphs
root1 = DirectFunctionCall4(create_complete_graph, 
                            CStringGetDatum(graph.graph_name), 
                            Int32GetDatum(graph.graph_size),
                            CStringGetDatum(graph.edge_label), 
                            CStringGetDatum(graph.vertex_label));
root2 = DirectFunctionCall4(create_complete_graph, 
                            CStringGetDatum(graph.graph_name), 
                            Int32GetDatum(graph.graph_size),
                            CStringGetDatum(graph.edge_label), 
                            CStringGetDatum(graph.vertex_label));

With this we already have two complete graphs disconnected from each other.

Now let's build a bridge of n nodes connecting each complete graph.

But to build nodes and edges, we need to fetch its label and sequential id's. Let's make a function to each of them:

static void fetch_label_ids(graph_components* graph) 
{
    graph->graph_oid = get_graph_oid(graph->graph_name);
    graph->vertex_label_id = 
        get_label_id(graph->vertex_label, 
                     graph->graph_oid);
    graph->edge_label_id = 
        get_label_id(graph->edge_label, 
                     graph->graph_oid);
}

static void fetch_seq_ids(graph_components* graph)
{
    graph_cache_data* graph_cache;
    label_cache_data* vtx_cache;
    label_cache_data* edge_cache;

    graph_cache = search_graph_name_cache(graph->graph_name);
    vtx_cache = search_label_name_graph_cache(graph->vertex_label,
                                              graph->graph_oid);
    edge_cache = search_label_name_graph_cache(graph->edge_label,
                                               graph->graph_oid);

    graph->vtx_seq_id = 
        get_relname_relid(NameStr(vtx_cache->seq_name),
                          graph_cache->namespace);
    graph->edge_seq_id = 
        get_relname_relid(NameStr(edge_cache->seq_name),
                          graph_cache->namespace);
}

Now we call these functions inside the main create_barbell function, passing the address of the graph struct :

PG_FUNCTION_INFO_V1(age_create_barbell_graph);

Datum age_create_barbell_graph(PG_FUNCTION_ARGS) 
{
    struct graph_components graph;
    Datum root1, root2;

    validate_barbell_function_args(fcinfo);
    initialize_graph(fcinfo, &graph);

    // create two separate complete graphs
    root1 = DirectFunctionCall4(create_complete_graph, 
                                CStringGetDatum(graph.graph_name), 
                                Int32GetDatum(graph.graph_size),
                                CStringGetDatum(graph.edge_label), 
                                CStringGetDatum(graph.vertex_label));
    root2 = DirectFunctionCall4(create_complete_graph, 
                                CStringGetDatum(graph.graph_name), 
                                Int32GetDatum(graph.graph_size),
                                CStringGetDatum(graph.edge_label), 
                                CStringGetDatum(graph.vertex_label));

    fetch_label_ids(&graph);
    fetch_seq_ids(&graph);

    PG_RETURN_VOID();
}

Now let's connect these graphs already. We will make three functions to do this, create_vertex, connect_vertexes_by_graphid and insert_bridge:

static graphid create_vertex(graph_components* graph)
{
    int next_index;
    graphid new_graph_id; 

    next_index = nextval_internal(graph->vtx_seq_id, true);
    new_graph_id = make_graphid(graph->vertex_label_id, 
                                next_index);
    insert_vertex_simple(graph->graph_oid,
                         graph->vertex_label,
                         new_graph_id,
                         create_empty_agtype());
    return new_graph_id;
} 


static graphid connect_vertexes_by_graphid(graph_components* graph, 
                                           graphid out_vtx,
                                           graphid in_vtx)
{
    int nextval;
    graphid new_graphid; 

    nextval = nextval_internal(graph->edge_seq_id, true);
    new_graphid = make_graphid(graph->edge_label_id, nextval);

    insert_edge_simple(graph->graph_oid,
                       graph->edge_label,
                       new_graphid, out_vtx, in_vtx,
                       create_empty_agtype());
    return new_graphid;
}


static void insert_bridge(graph_components* graph, graphid beginning, 
                          graphid end, int32 bridge_size) 
{
    graphid current_graphid;
    graphid prior_graphid;

    prior_graphid = end;

    for (int i = 0; i<bridge_size; i++)
    {
        current_graphid = create_vertex(graph);
        connect_vertexes_by_graphid(graph, prior_graphid, current_graphid);
        prior_graphid = current_graphid;
    }

    // connect prior vertex to last index
    connect_vertexes_by_graphid(graph, prior_graphid, beginning);
}

With this made, we just fetch the bridge size from the function arguments and call insert_bridge, which will make our main function look like this:

PG_FUNCTION_INFO_V1(age_create_barbell_graph);

Datum age_create_barbell_graph(PG_FUNCTION_ARGS) 
{
    struct graph_components graph;
    Datum root1, root2;
    int32 bridge_size;

    validate_barbell_function_args(fcinfo);
    initialize_graph(fcinfo, &graph);

    // create two separate complete graphs
    root1 = DirectFunctionCall4(create_complete_graph, 
                                CStringGetDatum(graph.graph_name), 
                                Int32GetDatum(graph.graph_size),
                                CStringGetDatum(graph.edge_label), 
                                CStringGetDatum(graph.vertex_label));
    root2 = DirectFunctionCall4(create_complete_graph, 
                                CStringGetDatum(graph.graph_name), 
                                Int32GetDatum(graph.graph_size),
                                CStringGetDatum(graph.edge_label), 
                                CStringGetDatum(graph.vertex_label));

    fetch_label_ids(&graph);
    fetch_seq_ids(&graph);

    // connect two vertexes with a path of n vertexes
    bridge_size = fcinfo->arg[2];
    insert_bridge(&graph, DATUM_GET_GRAPHID(root1), 
                  DATUM_GET_GRAPHID(root2), bridge_size);

    PG_RETURN_DATUM(root1);
}

Also we changed the return value to be the graphid root1, so it may be used to make other pre-modeled graph functions in the future.

Now to call the function on psql interface, we issue the command:

SELECT * FROM age_create_barbell_graph('testing_graph',5,3,'nodeLabel','{}','edgeLabel','{}');

This will create a Barbell graph with two complete graphs with 5 nodes each and a bridge of 3 nodes connecting them.

I make these posts in order to guide people into the development of a new technology. If you find anything incorrect, I urge you to comment below so I can fix it. Thanks!

Check Apache AGE: https://age.apache.org/.

Overview — Apache AGE master documentation: https://age.apache.org/age-manual/master/intro/overview.html.

GitHub - apache/age: https://github.com/apache/age

Blog

Refactoring The Create Barbell Graph Function

Marco Aurélio Silva de Souza Júnior

Join Our Newsletter. No Spam, Only the good stuff.

Related