dave medlock 🧔🏻
Posted on January 6, 2020
I have to begin by admitting that I'm no expert with NoSQL databases. I understand them conceptually and have worked hands-on with MongoDB and AWS DynamoDB in some learning and exploratory environments. But I have yet to build anything substantial using this particular technology, so it's on my list of things to dive deeper into in the coming year.
That said, the concept of being able to store data as a document is a powerful idea in an age where it seems that everything is becoming more connected through the power of REST APIs. At the same time, many companies large and small are still bound to their traditional relational SQL databases which have their own pros and cons.
I have seen some companies who take a hybrid approach - deploying an RDBMS like PostgreSQL to handle certain functions while other data sets are stored in MongoDB. The problem here though is that in order to support this, you're increasing the level of complexity in your environment as well as the costs that go into it. You need people who can administer both and people who can develop with both. You're also going to need to deploy instances of each in your non-production and production environments. And then there's the matter of sharing data between the two.
As a result of the increasing market share that NoSQL databases have started to capture, most SQL database platforms have responded by adding functionality to support storage of and interaction with JSON documents as a built-in feature. This means that in SQL Server, PostgreSQL, Oracle, and MySQL you can now store and interact with JSON documents that are stored in a table, right there among your other columns. This means that you can now contemplate using a single database platform to solve for scenarios where JSON documents are needed as well as places where a relational structure makes more sense.
One caution that I will add at this point is that you should make sure that you're very intentional in making decisions like these. Just because you can't doesn't always mean that you should. Give some clear thought to how you will utilize the data that you're storing and under what conditions you may need to access it. It's important to use the right tool for the job at hand, otherwise you may come to regret it later and find yourself migrating a bunch of data from an RDBMS table over to a MongoDB instance anyway.
I'm going to do a three part series on using JSON in relational databases. I'll cover MS SQL Server, PostgreSQL, and MySQL. I decided to do this because I recently did run into a situation where I was contemplating using a MongoDB as a data store for an application that already runs on PostgreSQL. I was dreading having to dive in, configure a new server, and complicate my deployment footprint. Then I read about PostgreSQL adding the json and jsonb data types and in my particular situation it makes perfect sense to utilize these structures. I figured that others might have similar situations, so hopefully you find this series useful.
I'm actually going to start with Microsoft SQL Server because the work I'm doing right this moment is more focused there, so I decided to start with what's top of mind.
Use Case for JSON in an RDBMS
First, I'll start by describing the use case that I'm going to leverage for this series. I do a substantial amount of work in healthcare and insurance systems. These systems tend to be highly configurable and it always struck me that a lot of the configuration work we do works best in document form than in a relational table form. For example, an insurance policy tends to have a large number of attributes and as policies and regulations change, you find yourself tacking new columns onto the ends of tables. This creates a lot of change in what should ideally be pretty stable data structures. And change can lead to issues in refreshing data warehouses and updating reports and sharing data.
But the document collection structure that NoSQL databases pioneered makes perfect sense to me for things like policy attributes, benefit plans, and provider contracts. A big part of the reason for that is that these structures often come from actual, real life documents like an Explanation of Benefits / Summary of Benefits, Insurance Policy, or an actual contract between a provider and a health plan. These documents are often a combination of various statements that might easily be represented as an attribute or set of attributes combined with tables that might outline things like fee schedules for specific services. All of that translates really well to a document structure like what we find in a JSON document.
So for this series we're going to use the example of an (extremely simplified) provider contract. We'll have a provider table that has a provider name and NPI and then we'll have a contract table that relates to the provider table through a foreign key. The contract table will store contract terms as a JSON document, a contract effective date, and a contract termination date. (Note that we could store those dates in the JSON document itself but because we always have these values on every contract, we're putting them in as separate fields. This is unlike attributes we'll put in the JSON document to indicate Out of Area and Out of Network contracts.) So the structure will look like:
- provider
- id (bigint, pk)
- provider_name (varchar(100))
- npi (varchar(10))
- provider_contract
- id (bigint, pk)
- provider_id (bigint, fk to provider.id)
- contract_terms (nvarchar(max)) > this is where our JSON document will go
- contract_eff_date (datetime)
- contract_term_date (datetime)
In the JSON document, we're going to keep that fairly simple. We're going to capture an array of contracted services. A contracted service can have a range of service codes and a fee schedule which could either be a flat rate or a percentage of the billed amount. (We're not building a full-fledged real-world claim system so we're drastically simplifying.) The JSON documents that we'll work with will be the following:
Princeton Plainsboro Teaching Hospital Contract:
{
"out_of_area": "true",
"service_rates": [
{"service_description": "Office Visits 1", "service_code_range_start": "99211", "service_code_range_end": "99221", "flat_rate": "45.00"},
{"service_description": "Office Visits 2", "service_code_range_start": "99222", "service_code_range_end": "99251", "flat_rate": "56.00"},
{"service_description": "Anesthesia", "service_code_range_start": "00100", "service_code_range_end": "00952", "billed_percentage": "0.7401"}
]
}
Sacred Heart Hospital Contract:
{
"out_of_network": "true",
"service_rates": [
{"service_description": "Misc Services", "service_code_range_start": "81200", "service_code_range_end": "82000", "billed_percentage": "0.6132"}
]
}
Overview of Working With JSON in Microsoft SQL Server
Microsoft SQL Server added support for JSON in MS SQL Server 2016. This came in the form of a set of functions that are used to parse, modify, query, and select from JSON documents that are stored in NVARCHAR fields.
There are five functions used for working with JSON documents in SQL Server:
- ISJSON(column/variable) - this will return a 1 if the contents of the column or variable you provide are a valid JSON document, a 0 if the contents are not valid JSON, and NULL if the contents are NULL.
- JSON_VALUE(column/variable, JSON path) - this will return a scalar value from the path you provide, if that path has a valid scalar value. Note that if you try to use JSON_VALUE to return an array within the JSON document then you're going get a null.
- JSON_QUERY(column/variable, JSON path) - this will return a JSON object from the path you provide. You can use this to return an array from within the JSON document, for example.
- OPENJSON(column/variable, option JSON path) - this function allows you to essentially convert a portion of a JSON document to table value. We'll use examples where we combine this with a WITH clause to map columns in the table to specific JSON paths
- JSON_MODIFY(column/variable, JSON path, new value) - this allows you to modify a value within a path. We won't spend much time on this at the moment but there is plenty of documentation available to assist if you find that you need this. At least in my view, solutions where we're using these functions should be restricted to situations where you're manipulating the JSON document within the applications and using SQL primary for storage and potentially some querying, so I don't want to go too far into this function right now.
Let's start by inserting our providers:
INSERT INTO providers (provider_name, npi) VALUES ('Princeton Plainsboro Teaching Hospital', '0123456789')
INSERT INTO providers (provider_name, npi) VALUES ('Sacred Heart Hospital', '1234560987')
Our providers table now has PPTH as id = 1 and Sacred Heart as id = 2. Let's insert contracts for each of them now. Note that I'm using the JSON documents we defined above in our "contract_terms" column:
INSERT INTO provider_contracts (provider_id, contract_terms, contract_eff_date, contract_term_date)
VALUES (
1,
'{ "out_of_area": "true", "service_rates": [ {"service_description": "Office Visits 1", "service_code_range_start": "99211", "service_code_range_end": "99221", "flat_rate": "45.00"}, {"service_description": "Office Visits 2", "service_code_range_start": "99222", "service_code_range_end": "99251", "flat_rate": "56.00"}, {"service_description": "Anesthesia", "service_code_range_start": "00100", "service_code_range_end": "00952", "billed_percentage": "0.7401"}] }',
'20190101 00:00:00',
'20201231 23:59:59'
)
INSERT INTO provider_contracts (provider_id, contract_terms, contract_eff_date, contract_term_date)
VALUES (
2,
'{ "out_of_network": "true", "service_rates": [{"service_description": "Misc Services", "service_code_range_start": "81200", "service_code_range_end": "82000", "billed_percentage": "0.6132"}] }',
'20200101 00:00:00',
'20201231 23:59:59'
)
Now, let's run a query using one of our new JSON functions to make sure that everything looks good.
SELECT id, provider_id, ISJSON(contract_terms) AS is_valid_json
FROM provider_contracts
If we did everything right, then we have two rows and the is_valid_json column in both rows is equal to 1, meaning that we have valid JSON documents in those fields.
Now let's run a query to find any providers that are out of area. We're going to do this by using the JSON_VALUE function to select our out of area and out of network fields from the JSON document and we're going to use the same function in the WHERE clause to filter for out of area = true.
SELECT id, JSON_VALUE(contract_terms, '$.out_of_area') AS out_of_area,
JSON_VALUE(contract_terms, '$.out_of_network') AS out_of_network
FROM provider_contracts
WHERE JSON_VALUE(contract_terms, '$.out_of_area') = 'true'
In this case, we're going to get one record (id = 1 for me, this is the PPTH contract) and the out_of_area will be = 'true'. The out_of_network column is null because that value doesn't exist in the PPTH contract document. This is exactly what we would expect to happen - the JSON_VALUE function doesn't return an error, it simply tells us that the value we're asking for based on that path just doesn't exist.
One thing that I will note here is that if I want to get something specific from our service_rates array, I'm going to have to specify all the way down to the specific value within the array that I want to get. For example, if I want the flat rate from the first service rate ("Office Visit 1") then I'm going to have to do something like this:
SELECT p.id, pc.id AS contract_id,
p.provider_name,
JSON_VALUE(pc.contract_terms, '$.service_rates[0].flat_rate') AS office_visit_rate
FROM provider_contracts pc
INNER JOIN providers p ON p.id = pc.provider_id
WHERE JSON_VALUE(pc.contract_terms, '$.service_rates[0].service_description') = 'Office Visits 1'
If I try to get index [3] from the array, I get a null because the indexes will only go up to 2 (because there are three service rates in the contract). The problem here is that it requires that I actually know where in the array my Office Visits 1 service definition is. This isn't very practical and makes it difficult to work with. We'll see how we have to work around this when we get into the OPENJSON function.
Next, let's say that I want to return a provider JSON document in my query that includes the provider details and the contract details as well. This is a good place for us to use JSON_QUERY. We start by creating what looks like a perfectly normal select statement with an inner join between providers and provider_contracts. But then, we wrap our contract_terms field in JSON_QUERY and give it an alias (contract_terms). And at the very end we tack on FOR JSON AUTO. By doing this, we're telling SQL Server we want it to return a JSON object and that it should automatically formulate the structure based on the order of our select statement and the table structure. If we use AUTO here then we don't have any further control over what SQL gives us. We can use PATH in place of this if we want more control over it. (Note that if we use path, we can't have two columns with the same name, so we'll have to alias our [contracts].id field.) Finally, note that we can use the ROOT method to specify what we want the top level/root of the document to be. Here we're going to use providers. If we don't use this, then we get an array of JSON objects back. By using ROOT('providers') we're going to get a providers object that contains an array of providers.
SELECT p.id,
p.provider_name,
p.npi,
[contracts].id,
JSON_QUERY([contracts].contract_terms, '$') AS contract_terms,
[contracts].contract_eff_date,
[contracts].contract_term_date
FROM providers p
INNER JOIN provider_contracts [contracts] ON p.id = [contracts].provider_id
FOR JSON AUTO, ROOT('providers')
Finally, let's say that we have a service code and we want to retrieve the service rate to be used for calculating a payment on that service. Let's say we get service code 99215 for PPTH. By looking at our documents, we know we should get the Office Visits 1 service rate which pays a flat rate of $45.00 for services in the range of 99211 to 99221. Remember how we said that using JSON_VALUE requires us to know exactly where in the document our data is. In order to query in the way we need to for this task, we need to use OPENJSON to essentially convert our JSON document into a table. Then we can filter that table to get the values we need just like we would any other query.
The first thing I'm going to have to do is declare a variable as an NVARCHAR(MAX). Then, I select my contract terms field (using JSON_QUERY to wrap it and retrieving the service rates structure from the contract_terms). I'm assuming here that I already know which contract ID I need to retrieve (1).
DECLARE @json NVARCHAR(MAX)
SELECT @json = JSON_QUERY(contract_terms, '$.service_rates') FROM provider_contracts WHERE id = 1
PRINT @json
I'm printing out the value of the variable just to check that I've got what I expected.
[ {"service_description": "Office Visits 1", "service_code_range_start": "99211", "service_code_range_end": "99221", "flat_rate": "45.00"}, {"service_description": "Office Visits 2", "service_code_range_start": "99222", "service_code_range_end": "99251", "flat_rate": "56.00"}, {"service_description": "Anesthesia", "service_code_range_start": "00100", "service_code_range_end": "00952", "billed_percentage": "0.7401"}]
Now, I can use this @json variable in a query. I'm going to pass @json to the OPENJSON function, which will essentially convert that document to a table for me. I'm using the WITH clause to explicitly specify the scalar value fields I want back and the path where those scalar value fields can be found. I'm aliasing this table that's returned from OPENJSON as "js".
I then use CROSS APPLY to get other fields from my provider_contracts field (making sure to specify the provider contract ID in my WHERE clause, the same as I did when I retrieved the JSON). I do an INNER JOIN to my providers table so I can get my provider name and NPI fields. I then add fields in my WHERE clause to get the service rates where my service code falls within the range. (Try changing the code in both of the WHERE statements from 99215 to 00102 and see the difference in results.)
DECLARE @json NVARCHAR(MAX)
SELECT @json = JSON_QUERY(contract_terms, '$.service_rates') FROM provider_contracts WHERE id = 1
PRINT @json
SELECT p.provider_name, p.npi, pc.id, pc.provider_id,
pc.contract_eff_date, pc.contract_term_date, js.*
FROM OPENJSON(@json)
WITH (
svc_description nvarchar(100) '$.service_description',
svc_range_start nvarchar(100) '$.service_code_range_start',
svc_range_end nvarchar(100) '$.service_code_range_end',
flat_rate decimal(12, 2) '$.flat_rate',
billed_percentage decimal (12, 4) '$.billed_percentage'
) AS js
CROSS APPLY provider_contracts AS pc
INNER JOIN providers p ON p.id = pc.provider_id
WHERE pc.id = 1
AND js.svc_range_start <= '99215'
AND js.svc_range_end >= '99215'
You can see here how we can convert a JSON document stored in an NVARCHAR field into a table and then use that the way we would any other table structure. (Also, I'm fairly certain there's probably a better way than above as well, so I'd love to hear alternative approaches too.) We could wrap the code above in a stored procedure that takes the service code and the contract ID (or provider ID and the service date with a few tweaks) and we could then have the system return to us the contract details we need to calculate payment for the selected service in a table structure.
There's much more to learn about using JSON within MS SQL Server. I encourage you to play around with these functions and see what more you can get out of them.
I will say that this approach in general feels very kludgey to me. It seems like we're writing a lot of here to accomplish something that should be easier. However, at the same time, the JSON document gives us a lot of flexibility within this contract structure that might be useful. I tend to look at the functionality that SQL Server provides as convenient if you need it, but I doubt that I would intentionally build something on top of this. The primary reason I'd be interested in storing JSON in a SQL Server database like this would be so that I can have an application retrieve it and then use it from there. However, there's always the possibility that I'd want or need some backend process to operate on the data for some reason (e.g. data migrations, cleanup, updates, etc.) and in this case, SQL Server gives me some mechanism for doing that.
In the next part of the series I'm going to cover PostgreSQL's json and jsonb data types. Unlike Microsoft, the PostgreSQL community decided to make new data types and as a result I think we're going to see a lot more flexibility and function with that approach. Until next time, happy coding!
Posted on January 6, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.