How Reddit Built Authorization with OPA

Introduction

Effectively managing Authorization in Advertising Technology (Ad Tech), is a complex and crucial task. Tackling this challenge, *Reddit * developed an advanced authorization system for its advertising platform, a process described in great detail by Staff Engineer Braden Groom in this Reddit post.

Inspired by Braden’s post, this blog explores the journey of Reddit's team, focusing on their strategic decisions, the challenges they encountered, and the innovative solutions they crafted. Alongside Reddit's in-house efforts, we also examine OPAL, an open-source solution that aligns with the functionality of Reddit’s system, presenting an alternative approach for organizations seeking sophisticated authorization management solutions in the field of Ad Tech.

Authorization in Ad Tech

Advertising technology consists of a wide range of digital strategies and platforms for targeting, delivering, and analyzing online advertisements. From advertisers and publishers to ad exchanges and networks, each of these entities manages vast amounts of data and a range of transactions with multiple stakeholders involved.

One of the critical challenges in Ad Tech is authorization. Unlike simpler systems where access control can be relatively straightforward, authorization in Ad Tech platforms requires a more sophisticated approach.

Advertisers have distinct expectations of how their accounts should be structured and who should access them. These systems should not only be thoroughly secure but also flexible enough to accommodate a wide range of requirements. They need to manage who (or what, in the case of automated systems) has access to which parts of the platform, under what conditions, and to what extent. This requirement for detailed and dynamic access control is where basic authorization solutions often fall short, necessitating more advanced, granular solutions.

Reddit’s Ad Tech platform as a case study

Reddit is one of the largest online platforms out there, with its advertising platform being an integral part of its business model, allowing businesses to tap into its vast and varied user base. The platform enables advertisers to target audiences based on interests, demographics, and behaviors.

As much of Reddit’s content is public and does not necessitate an overly complex authorization system, Reddit’s team was unable to find an existing generalized authorization service within the company and started exploring the development of a homebrew solution within the ads organization.

In his post, Braden Groom, a Staff Engineer at Reddit, describes the unique challenges Reddit faced with creating authorization for its Ad Tech platform, specifically with crafting an effective authorization system for its advertising platform. These challenges stem from the need to cater to a diverse range of advertiser requirements and the complex nature of digital advertising itself.

Let’s start by seeing what their requirements were -

Reddit’s Authorization Requirements:

In developing its authorization system for the advertising platform, Reddit identified several crucial requirements that would guide its approach. These requirements were essential in ensuring that the system would not only meet their current needs but also be adaptable for future challenges:

Low latency

Every action on Reddit’s advertising platform necessitates a rapid authorization check. This requirement for low latency is crucial to ensure a seamless experience for users and advertisers alike.

Availability

An outage in the authorization service could mean a complete halt in the operation of the advertising platform, making it impossible to perform essential authorization checks. Therefore, high uptime is critical to maintain the continuous functioning of the platform.

Auditability

For security and compliance, a detailed log of all decisions made by the authorization service is necessary. This aspect of auditability is not just a regulatory requirement but also a fundamental component of managing this system, especially in the case of unauthorized access. It allows Reddit to track and review authorization decisions, ensuring that the system is functioning correctly and adhering to all necessary policies and regulations.

Flexibility

Reddit, like every player in the advertising landscape, must frequently evolve based on the expectations of its advertising partners - allowing them to define and manage their own roles. Therefore, the authorization system must be flexible and adaptable to changing requirements without significant overhauls.

Multi-Tenancy (stretch goal)

While not an explicit initial requirement, Reddit aimed for a multi-tenant capability in their authorization system. This goal was set with the understanding that a generalized authorization solution was lacking at Reddit, and thus, a system that could address multiple use cases across the company would be beneficial. Although focused on the advertising platform, this stretch goal would enhance the system’s flexibility and scalability, allowing it to potentially serve various needs across Reddit as a whole.

With the requirements laid out before us, there is one more important challenge, unique to ad tech, to consider -

Authorization for Anonymous Identities

An additional significant challenge with advertising on a platform like Reddit comes from the fact that a large portion of user interaction occurs through anonymous identities, as you don’t have to create an account in order to browse content (And see ads while you do). This presents a unique challenge in the context of advertising authorization.

When it comes to advertising, the platform needs to perform authorization checks to determine which ads to show to which users. These checks are straightforward when dealing with registered users, as their profiles, preferences, and histories can guide ad targeting. However, for anonymous users, the platform lacks this personalized data, requiring a different approach to authorization and ad targeting.

So how did the Reddit team seek to tackle these challenges, and what can we learn from their experience? Let’s start with the first principle they decided to implement -

Decoupling Policy from Code

Inspired by Google Zanzibar, the first decision the Reddit team decided to make when designing their authorization solution was decoupling policy and code. Separating the app’s authorization code from the actual application code is a recommended best practice for various reasons.

The Reddit team explained their choice to do so meant their database had to perform no rule evaluation when fetching rules at query time, keeping the query patterns “simple, fast, and easily cacheable”. Rule evaluation will thus only happen in the application after the database has returned all of the relevant rules. Having the policy storage and evaluation engines clearly isolated also allowed them to potentially replace one of them with relative ease if they decide to do so in the future.

To achieve that, they have decided to use the Open Policy Agent (OPA) policy engine.

Open Policy Agent

Open Policy Agent (OPA) is an open-source, general-purpose policy engine that decouples policy decision-making from policy enforcement. It provides a high-level declarative language (Rego) to specify policy as code and APIs to offload decision-making from your software.

OPA was already in use at Reddit for Kubernetes-related authorization tasks. It also provides a testing framework that the Reddit team could use to enforce 100% coverage for policy authors.

Using OPA allowed the Reddit team to facilitate centralized rule management, allowing policies to be defined in a unified manner and applied consistently across various parts of a system, maintaining consistency and manageability.

OPA’s architecture also enabled policies to be enforced anywhere in the system without requiring the policy engine to be co-located with the service making the decision. This is aligned with Reddit's approach of separating rule retrieval from rule evaluation.

An example of Rego code representing ABAC rule 👇🏻

package abac

# User attributes
user_attributes := {
    "Squid": {"tenure": 10, "title": "cashier"},
    "Pat": {"tenure": 0.5, "title": "cashier"}
}

# Menu attributes
menu_attributes := {
    "Burger": {"items": "Menu", "price": 3},
    "Shake": {"items": "Menu", "price": 1}
}

default allow = false

# All cashiers may process orders of up to 1$ total
allow {
    # Lookup the user's attributes
    user := user_attributes[input.user]
    # Check that the user is a cashier
    user.title == "cashier"
    # Check that the item being sold is on the menu
    menu_attributes[input.ticker].items == "Menu"
    # Check that the processed amount is under 1$
    input.amount <= 1

}

# Cashiers with 1=> year of experience may ⁠process orders of up to 10$ total.
allow {
   # Lookup the user's attributes
    user := user_attributes[input.user]
    # Check that the user is a cashier
    user.title == "cashier"

    # Check that the item being sold is on the menu
    menu_attributes[input.ticker].items == "Menu"
    # Check that the user has at least 1 year of experience
    user.tenure > 1
    # Check that the processed amount is under is under $10
    input.amount <= 10
}

Another decision made by the Reddit team was to build a centralized service instead of a system of sidecars. While the sidecar approach seemed viable, it seemed unnecessarily complex for their needs, so they opted for a centralized service to keep maintenance costs down.

Modeling Policy Rules

As outlined in the requirements of the Reddit team, it was crucial for them to create a highly flexible system capable of accommodating the evolving needs of their advertising platform.

To do that, developers often utilize the use of common authorization models such as RBAC, ABAC, and ReBAC. The Reddit team decided to take a more abstract approach, creating rules consisting of three fields describing access policies:

Subject: Describes who or what the rule pertains to.
Action: Specifies what the subject is allowed to do.
Object: Defines what the subject may act upon.

As well as two more fields representing different layers of isolation:

Domain - Represents the specific use-case within the authorization system. For example, there is a distinct domain dedicated to advertisements, while other teams within Reddit can utilize the service for different domains, such as community moderation, maintaining isolation from the advertising domain.
Shard ID - Provides an additional layer of sharding within the domain. In the advertising domain, sharding is organized by the advertiser's business ID, whereas in the community moderation domain, sharding could be based on community IDs.

Rule Storage, OPAL, and GitOps

The system Reddit designed does not enforce validations on these fields. Each use-case has the freedom to store simple IDs or employ more sophisticated approaches, such as using paths to describe the scope of access. Each use-case can shape its rules as needed and encode any desired meaning into its policy for rule evaluation.

Whenever the service is asked to check access, it only has one type of query pattern to fulfill. Each check request is limited to a specific (domain, shard ID) combination, so the service simply needs to retrieve the bounded list of rules for that shard ID. Having this single simple query pattern keeps things fast and easily cacheable. This list of rules is then passed to the evaluation side of the service.

As mentioned above, Reddit’s team made a strategic decision to develop a centralized service for managing authorization. While the Reddit team made the decision to develop this capability themselves, it is also possible to achieve these results by using OPAL.

Open Policy Administration Layer (OPAL), is an open source administration layer for Policy Engines such as Open Policy Agent (OPA), and AWS' Cedar Agent that detects changes to both policy and policy data in real time and pushes live updates to those agents. Using Git repositories and GitOps as a method for rule storage, OPAL provides several benefits:

Version Control: Using Git repositories for rule storage means that every change is tracked. This is crucial for audit trails, allowing teams to see who made changes, when, and why.
Rollback and History: In case of errors or unforeseen issues, it’s easy to roll back to previous versions of policies, enhancing the system's reliability.
Collaboration and Review: GitOps facilitates collaboration among team members. Changes can be reviewed through merge requests, ensuring that updates to policies undergo scrutiny before implementation.
Automated Deployment: Changes in the repository can trigger automated deployments, making the update process more efficient and reducing manual intervention.

Rule Evaluation, Policy, and Data Synchronization

After successfully establishing a system for efficiently retrieving rules, the next step for the Reddit team was to evaluate these rules and generate an answer for the client.

For each domain within their system, the ability to define specific policies determining how rules are evaluated was crucial. Although their application was written in Go, which would have facilitated the direct implementation of these policies, Reddit prioritized keeping policy logic distinctly separate from the application logic.

As mentioned previously, this separation served two key purposes: it prevented policy logic from inadvertently influencing other parts of the service and allowed for remote updating of policy logic, enabling clients to publish policy updates independently of service deployments.

In parallel to the bespoke solution developed by Reddit, OPAL (Open Policy Administration Layer) stands out as a ready-made open-source solution offering similar capabilities. OPAL, when used in conjunction with OPA, acts as a dynamic administration layer, ensuring the policy engine is continuously synchronized with the latest policies and data. This is achieved by deploying OPAL Clients alongside OPA, which then subscribe to topic-based Pub/Sub updates. These updates are efficiently managed and disseminated from the OPAL Server, supplemented by data from various sources like databases, APIs, or third-party services.

The synergy of OPA and OPAL provides a comprehensive solution for managing authorization systems, particularly in environments as dynamic and complex as Reddit's advertising platform.

Please Support Us

If you find this post helpful,please give OPAL a star on GitHub! Your support helps us make access control easier and motivates us to write more articles like this one.

Auditing

To fulfill the requirements mentioned above, the Reddit team had to create a system of Audit Logs to record all decisions made by the service, playing a crucial role in both compliance and security. The auditing mechanism was implemented in two distinct parts:

A change data capture pipeline: This system is engineered to track and upload all changes occurring within the database directly to BigQuery. This process ensures that every modification, whether minor or significant, is captured and stored securely for review and analysis.
Application logs : The application itself also logs all of the decisions, which are then uploaded to BigQuery by a sidecar. While this functionality was developed in-house, it's noteworthy that OPA also offers a decision log feature.

Initially, these auditing features were primarily integrated for compliance and security purposes. However, as the system evolved, the team discovered an additional, invaluable benefit: these logs became a powerful tool for debugging. By providing detailed insights into the decision-making process of the authorization service, the team gained the ability to trace and rectify issues with greater efficiency and precision.

Performance

Following the implementation of their newly developed authorization service, the Reddit team undertook several key steps to align the service with the needs of their advertising platform.

They established a detailed rule structure, defined policies for rule evaluation, integrated authorization checks throughout the platform, and developed user interfaces tailored for rule definition on a per-business basis.

Upon reviewing the performance of the newly implemented service, the Reddit team reported outstanding results. The service has demonstrated impressive efficiency, with p99 latencies around 8 milliseconds and p50 latencies close to 3 milliseconds for authorization checks. These metrics are indicative of the service's ability to handle authorization requests swiftly and effectively.

Equally notable is the service's remarkable stability. Since its launch over a year ago (At the time Branden published his post), the service has operated without any outages, underscoring its reliability. Interestingly, the majority of issues encountered were related to logical errors within the policies themselves, rather than the infrastructure or the software.

A critical factor contributing to the service's success is the separation of policy and code and its effective use of audit logs. This system ensures that the results of every check are accurately recorded, and isolated from other software complexities. In many systems, the lack of clear separation between policy and code can lead to muddled results and increased difficulty in debugging. However, by decoupling policy from code and maintaining detailed audit logs, Reddit has been able to achieve clear, unambiguous insights into the authorization process. This clarity not only aids in compliance and security but also significantly enhances the debugging and maintenance process, ensuring the system remains efficient and reliable.

Conclusion

Reddit's journey in crafting the authorization system for its advertising platform shows the complex challenges in the realm of Ad Tech. Their endeavor, as described by Staff Engineer Braden Groom, underscores a crucial evolution in managing digital advertising's nuanced authorization demands.

The choice to decouple policy from code, inspired by Google's Zanzibar, and the implementation of Open Policy Agent (OPA) reflect Reddit’s commitment to building a system that is both robust and adaptable. The auditing mechanisms they established, ensuring every action is logged and reviewable, not only enhance security and compliance but also serve as a powerful tool for debugging and system refinement.

Reddit's results speak for themselves. The performance metrics of their authorization checks and the system's remarkable stability since its launch are testaments to the efficacy of their solution. The clear separation of policy and code, paired with comprehensive audit logs, has provided them with a system that is efficient, reliable, and adaptable to ever-changing needs.

We also learned about OPAL, an open-source solution that echoes the functionalities developed by Reddit. Its use of Git repositories and GitOps for rule storage, combined with the dynamic synchronization capabilities provided by OPAL Clients, offers a streamlined, efficient alternative for managing complex authorization systems. For organizations seeking to implement sophisticated authorization management without building from scratch, OPAL presents a compelling option, enabling them to stay agile and responsive in the fast-evolving landscape of digital advertising.

Reddit's case study is a shining example of how thoughtful, well-engineered solutions can successfully meet the intricate demands of Ad Tech authorization. It serves as both a blueprint and an inspiration for others in the industry, showcasing the power of innovative thinking in overcoming complex technological challenges.

Want to learn more about Authorization? Join our Slack community, where there are hundreds of devs building and implementing authorization.

Blog