Tom Nijhof
Posted on February 7, 2024
To make a knowledge graph, it is useful to have a vocabulary in place, which is called an ontology.
The Medical Subject Headings is one such ontology, which includes many of the medical terms that are currently being used.
It can be downloaded as an RDF file (N-triples), making it easy to import to Neo4j with neosemantics (n10s).
Installing n10s in neo4j desktop
The next three commands will import the 2021 MeSH graph directly into Neo4j. It will take a moment before all 2 million nodes and 4 million relations are loaded in.
CREATE CONSTRAINT n10s_unique_uri ON (r:Resource) ASSERT r.uri IS UNIQUE;
CALL n10s.graphconfig.init();
CALL n10s.rdf.import.fetch("https://nlmpubs.nlm.nih.gov/projects/mesh/rdf/2021/mesh2021.nt","N-Triples");
Exploring the Data
Before I start, I will set the caption to rdfs_label for resources, so the nodes have a name. For ns0_Term, I will use ns0_prefLabel.
Naming nodes within Neo4j desktop
Let's start with the sexiest thing to do — reading the documentation of RDF data structure of medical terms used to sort medical papers.
Did I say “sexy”? I meant nerdiest.
I will not go over the full structure; instead, I will select just two elements I think are interesting to start with. Feel free to disagree.
The code snippets in this blog are cypher query you can use in Neo4j. It is not needed but might be useful if you want to know how I got the results, or they can serve as an example that my cypher is not optimized, up to standard, etc.
Terms, Descriptors, and Concepts
Descriptors, concepts, and terms are very closely related. Descriptors are the broadest — within descriptors, you have concepts (at least one that is the preferred one). Concepts have terms — these terms hold synonyms for the concepts. Each concept has one preferred term, while the descriptor also has one preferred term out of all (see picture below).
MATCH (q:ns0__Term)<-[]-(n:ns0__TopicalDescriptor)-[]->(p:ns0__Concept)-[]->(z:ns0__Term)
WHERE (n.rdfs__label = "Calcimycin")
return n, p, q, z
Relation between descriptor (pink), concepts (green), and terms (blue)
Terms are very useful for labeling text. Concepts can define a part that is smaller than the whole descriptor. The descriptor holds the connection to the rest of the graph (tree, other descriptors, SCR, Qualifiers, etc.). I will mainly focus on the descriptors for graph algorithms.
Tree Structure
All TopicalDescriptor have a link to a tree-number (ns0_treeNumber) and to another TopicalDescriptor (ns0_broaderDescriptor).
These two hold very similar information but have one use case where they differ: multiple tree locations.
A descriptor can be in more than one tree at the same time (like the descriptor “eye”). Eye has tree number **A01.456.505.420 **as a subcategory of face, and **A09.371 **as a subcategory of Sense Organ. This can give us problems because these two tree numbers do NOT have the same subcategories!
Eyebrows are part of the eye as part of the face but are NOT part of the eye as part of a sense organ.
Tree overview of Eye in online MeSH Browser
If we use ns0__broaderDescriptor to go back from Eyebrows to the broadest description, we come upon a mistake. The broader description of Eyebrows is Eye, which has two broader descriptions (namely, sense organs and face). As Eyebrows is not a sense organ, this shouldn’t be correct.
MATCH (n:ns0__TopicalDescriptor)-[:ns0__broaderDescriptor*]->(p:ns0__TopicalDescriptor)
WHERE n.rdfs__label = "Eyebrows"
return n, p
Sense Organs is found as broader description of Eyebrows
The other way is to go via the tree numbers. This will mean Eyebrows is only connected to one of the two tree numbers of Eye and does NOT have Sense organs as a broader description.
MATCH (n:ns0__TopicalDescriptor)-[:ns0__treeNumber]->(t:ns0__TreeNumber)-[:ns0__parentTreeNumber*]->(p:ns0__TreeNumber)<-[:ns0__treeNumber]-(d:ns0__TopicalDescriptor)
WHERE n.rdfs__label = "Eyebrows"
return n, t, p, d
Going via the tree number gives only “Body Regions” and “Integumentary System” as the broadest descriptor
For this reason, I will use ns0_treeNumber to find hierarchical relationships rather than ns0_broaderDescriptor.
Conclusion
In conclusion, using the Medical Subject Headings (MeSH) ontology to create a knowledge graph is highly beneficial. By importing MeSH as an RDF file into Neo4j with neosemantics (n10s), we can easily explore the extensive collection of medical terms and their relationships.
Descriptors, concepts, and terms are essential components of MeSH. Descriptors encompass broad categories, concepts provide specific definitions within descriptors, and terms offer synonyms for concepts. Understanding the hierarchical structure is crucial for effective graph analysis, with tree numbers being a more reliable way to establish relationships than broader descriptors.
In summary, MeSH is a valuable resource for constructing medical knowledge graphs. Leveraging its rich information and employing appropriate graph analysis techniques, researchers can gain meaningful insights from medical literature and data.
Posted on February 7, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.