Load external data into OPA: The Good, The Bad, and The Ugly
OdedBD
Posted on April 4, 2022
There are several ways to create a data fetching mechanism for OPA - each of them has its pros and cons. To make sense of these different methods, I've decided to create this guide that will help you figure out which data fetching method would be best for you, with full knowledge of each method’s ‘good, bad, and ugly’ aspects.
TL;DR -
The methods that we are going to review are:
Before we dive into details, let’s first cover some basics -
What is OPA
Authorization is becoming increasingly complicated - applications are getting bigger and require handling more users than ever before, policies are becoming more complex and dependent on multiple factors (Like a client’s location, time of the action, user roles, and relations to resources).
This is where OPA (Open Policy Agent) comes in - OPA is a great open-source tool that allows us to evaluate complicated policies. It’s fast, part of the CNCF (Which means it adheres to CNCF’s guidelines and standards), and is used for handling permissions in some of the largest companies in the world (e.g. Netflix, Pinterest, and others). You can check out an introduction to OPA here.
How OPA Works with Data
Managing policies with OPA often requires relevant contextual data - Information about the user, the resource they are trying to access, etc. Without this information, OPA will not be able to make the right decisions when it comes to deciding on policies.
For example - a policy that states “Only paying users can access this feature” requires OPA to have information on:
Who my users are
Which one of them is a paying user, and which isn’t
A policy that states “Users in the application can only access their own photos, or those of their children” requires OPA to know:
Who are the application’s users
Which user is a parent, which user is a child, and which user relates to who
Which photo belongs to each user
Having access to this contextual data is thus critical for OPA to be able to make its decisions.
The bottom-line question is - how can we bring this data into OPA, and which way is the most effective to do so?
The data fetching mechanism: Basic requirements
Before we dive into the different methods of fetching data for OPA, let’s agree on a couple of basic guidelines for how this data fetching mechanism should work:
It's necessary to be able to handle data about policies on a large scale
Because data can come from many sources, thus getting very complex very quickly, we want this mechanism to be as easily manageable as possible.
The data fetching mechanism needs to be operational in real-time (This is a crucial component that will allow us to avoid a “New enemy attack” - a situation where a user with revoked permission can still access sensitive data because the permissions have not been updated in time between the different parts of the system).
It should be easy to maintain because the need for access control is here to stay and is likely to evolve in the future.
Now that we've established some basic requirements, let’s dive into the various data fetching mechanisms we can utilize to solve our issue in the most efficient way.
Let’s dive in!
1. Including data in JWT tokens:
JSON Web Tokens (JWT) allow you to securely transmit signed JSON data between software systems and are usually produced during the authentication process. JWTs can be sent to OPA as inputs thus enabling OPA to make decisions about a Policy query.
For example, this is what a JWT with authorization data looks like:
The first part is the algorithm for the secure signing, and in the middle, we can see the roles and related images for our authorization.
The good:
JWTs are an easy-to-use well-known technology that you probably already utilize in your system (as part of the authentication layer).
The bad:
JWTs have a size limit - not everything can be decoded into a JWT, while it looks OK in the example presented above, if a user has 1000 files the JWT length changes from 239 characters to 20,057 - and that’s considering a simple file name. With the full path, it’s even longer. Additionally, a JWT created during the authentication phase doesn’t include all the necessary information required to make the policy decision - especially if you are using a vendor like Auth0 to authenticate. In addition, storing data in JWTs means we have to refresh the token (read as login/logout) every time we want to update the data.
The ugly:
You might think it’s a good idea to start with JWTs because you don’t have a lot of data - as time goes by the amount of data grows exponentially, and the situation easily spirals out of control, with an enormous amount of JWTs floating around in each request.
Bottom-line:
JWTs are ideal for simple identity-related data, and in general, it’s best to think of the claims and data in the JWT as hints about identity (given by the Identity-Management and Authentication layers) rather than verbatim data for authorization.
2. Overload input for OPA within the query:
Another option is to attach input for every policy query to OPA, adding the relevant data to it.
It will look something like this in python pseudocode wrapping OPA:
def delete_image(user_id, image_id):
policy_json_data = {}
policy_json_data[“user_roles”] = get_user_roles(user_id) # returns list of roles like [“editor”]
policy_json_data[”user_images”] = get_user_images(user_id) # returns list of images [“img.png”]
# sends request that looks like this:
# localhost:8181 -i -d ‘{"roles": ["pro"], "related_images": ["image0.png", "image1.png"], image_id: “image2.png”}’ -H 'Content-Type: application/json'
# and returns true / false
permitted = check_opa_policy(policy_json_data, “delete”, image_id)
if not permitted:
raise(AuthorizationError)
The good:
Using this method is simple, and it ensures that only the relevant data is cherry-picked for each query sent, thus avoiding loading/storing a lot of data in OPA.
The bad:
This method prevents us from following one of the most important best practices in building authorization - decoupling policy and code. As our code now has to take on the responsibility of tailoring the data for OPA. Having policy and code mixed together in one layer creates a situation where we struggle to upgrade, add capabilities and monitor the code overall as it is replicated between different microservices. Each change would require us to refactor large areas of code that only drift further from one another as these microservices develop.
The ugly:
Having so much code repetition is an antithesis to the DRY principle - creating a multitude of complications and difficulties as our application evolves. Considering the example code above for instance, a very similar code will be written to delete_image, update_image, and get_image.
Bottom-line:
In general, it is best to leave this method for simple cases or to augment more advanced cases with cherry-picking.
3. Polling for data using Bundles
The bundle feature periodically checks and downloads policy bundles from a centralized server, which can include both data and policies. An example of a simple way to implement this solution would be running an Nginx container that serves the bundle files and configuring OPA to fetch data from it (using s3 buckets is also a common pattern). The configuration for OPA will be as follows:
services:
nginx:
url: https://my-nginx.example.com
credentials:
bearer:
token: dGVzdGluZzp0ZXN0aW5n
scheme: Basic
bundles:
authz:
service: nginx
resource: /bundle.tar.gz
The good:
It allows you to load large amounts of data (much larger than the 2 previous methods), it has a delta bundle feature that lets you sync only new data (but not policy), it lets you have one source of truth, and it is more readable than JWTs.
The bad:
Using bundles doesn’t cut it when we have data that changes rapidly, as it requires triggering a full policy update for every small change - Making this a very inefficient process.
The ugly:
Even with the new delta bundle feature, you still need to manage and create the bundles on your own, and it works with polling which isn’t real-time.
In addition, being dependent on a polling interval means you have to choose between rapid polling which can result in high costs or slow polling which can lead to delays and risk of inconsistency.
The bottom line:
For cases where updates to data mainly come as part of the CI/CD cycle, bundles are a great option. Bundles can also work well for static or rather static applications. For modern dynamic applications, this option might be too slow/inefficient on its own.
4. Pushing data into OPA using the API
You can also push policy data into OPA with an API request - this approach is similar in most aspects to the bundle API, except it allows you to optimize for update latency and network traffic. It will look something like this in python pseudo-code:
def send_user_update_to_opa():
requests.put(f”{opa_url}/users”, params={users: [user1,user2]})
def callback_on_new_user():
all_users = get_all_users()
send_update_to_opa(all_users)
In this example, we are updating the user list of OPA for each callback on new user creation.
The good:
This way you don’t need to load the entire bundle at every update, you can also update part of it, which is much more performant in terms of memory and network usage - as well as giving you more control of how you manage distributed data into OPA.
The bad:
Applying this method to import new kinds of data from different data sources is going to require a continuous effort of writing enormous amounts of code.
The ugly:
This method requires continuous maintenance - you can’t just set it up and forget about it. If left abandoned, this code will very quickly become obsolete.
The bottom line:
Great way to load data into OPA in a dynamic fashion, but requires a lot of development and administration in all but very simple cases.
5. Pulling data using OPA during Policy Evaluation
OPA includes a function (http.send()
) that allows it to reach out to external HTTP servers during evaluation and request additional data. It will look something like this in Rego pseudo-code:
default allow = false
allow = true {
input.method == "GET"
input.path = ["getSalary", user]
managers := http.send(get_managers_url)
managers := managers[input.user][_]
contains(managers, user)
}
You can see the call to http.send(get_managers_url) that returns the list of the managers to help evaluate the policy. Similarly, you can embed more functions into OPA as a plugin to fetch data as part of a query from other sources.
The good:
This is a solid option to use when you have a very large volume of data that is required to make permission decisions, and you cannot load it all into OPA.
The bad:
Using this method puts a strain on OPA as it always comes with network latency that slows all of your policy evaluations. Additionally, this method is prone to network errors.
The ugly:
Error handling with rego isn’t simple at all, and relying on this feature can lead to some frustrating results. While OPA and Rego can be used to evaluate policies very quickly, you may want to avoid adding more logic than you need.
The bottom line:
This is a great way to load data into OPA in a highly dynamic way without writing a lot of code. That being said, this solution is not applicable when the relevant data requires parsing or edge case handling, which Rego lacks.
6. OPAL (Open Policy Administration Layer):
OPAL is an open-source project for administering authorization and access control for OPA. OPAL responds to policy and data changes, pushes live updates to OPA agents, and thus brings open policy up to the speed needed by live applications.
To run OPAL with OPA you can simply use the Docker example. Send an update to OPAL on every change in your data or connect your data source's webhook with OPAL and let OPAL stream the updates to OPA.
The good:
OPAL includes live updates and Git tracking (GitOps) and saves you the hassle of having to write all the code by yourself like in the ‘Pushing Data with API’ option.
The bad:
OPAL is a fairly new library, it might take some time to learn and some work to integrate into your project.
The ugly:
First of all, OPAL is beautiful (But being one of the contributors to this open source project I might be biased). That being said, the architecture can be a bit more complicated than bundle server/JWTs, so you might need to take your time and make sure you understand it.
The bottom line:
OPAL is inspired by the way companies like Netflix work with OPA, but it requires some work to set up. Simple applications will do better with one of the other methods, but for full modern applications, OPAL is probably the more robust/reliable option.
Conclusion
There are various methods to build data fetching mechanisms - each of them having their own pros and cons.
Some of these methods (Including data in JWT tokens and using Overload input for OPA within the query) could only prove useful in simple cases, some (Polling for data using Bundles) lack effectiveness in dynamic applications. Pushing data with API is a good solution to load data into OPA in a dynamic fashion while requiring a lot of development and administration, and Pulling data using OPA during Policy Evaluation is not applicable when the relevant data requires parsing or edge case handling. OPAL has the advantage of being a more robust/reliable solution, but it requires you to adopt new open-source-based technology.
The most important thing to take from this review is understanding the complexities and challenges of building data fetching mechanisms correctly, and understanding that every method has its pros and cons.
Still not sure which method is the right one for you and your architecture? Need someone to brainstorm with? Don’t be shy - reach out to us on our Slack community.
Posted on April 4, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.