How SafeLine Leverages Nginx
Lulu
Posted on August 23, 2024
Web Application Firewalls (WAFs) are security tools designed to protect against threats targeting HTTP/HTTPS traffic. In simple terms, a WAF identifies and blocks malicious traffic, allowing only safe, filtered traffic to reach your upstream servers. Among the various deployment modes, the reverse proxy mode has always been popular, and here's how it works:
In reverse proxy mode, the WAF receives HTTP requests from the internet, scans them for threats, and then forwards the safe requests to the upstream servers. To the outside world, the WAF appears to be the web server, hiding the actual servers from the client. Since all HTTP traffic first passes through the WAF, it’s crucial that the WAF can efficiently handle this traffic.
SafeLine uses Nginx to handle HTTP traffic in both its reverse proxy and transparent proxy modes. In this article, we’ll take a closer look at Nginx and how SafeLine utilizes it.
1. What is Nginx?
Nginx is described on its official website as:
"nginx [engine x] is an HTTP and reverse proxy server, a mail proxy server, and a generic TCP/UDP proxy server."
Nginx provides a wide range of functionalities, including HTTP server/reverse proxy, mail proxy, and generic TCP/UDP proxy services. Originally developed by Igor Sysoev, Nginx has since been acquired by F5, which now offers a commercial version. Despite the commercial version, the open-source version of Nginx remains highly active. According to the latest report from Netcraft, Nginx continues to hold the largest market share in the web server domain.
Nginx’s outstanding code architecture is key to its performance, and we'll explore some of these design elements next.
2. The Secrets Behind Nginx’s High Performance
2.1 Master/Worker Process Separation
Nginx operates on a multi-process model, specifically using a "single master + multiple workers" architecture. Here’s how these processes are divided:
Master Process: The master process doesn’t handle network events or execute business logic. Its role is to manage worker processes, enabling service restarts, smooth upgrades, log file rotations, and real-time configuration changes. The master process controls the worker processes via signals.
Worker Process: The worker processes handle business requests. Each worker process is single-threaded, and you can configure the number of worker processes. In production environments, it’s common to set the number of worker processes equal to the number of CPU cores and bind them to specific cores to maximize CPU utilization and reduce context switching costs.
In networking, we often talk about the "control plane" and "data plane". In Nginx’s process model, the master process serves as the control plane, managing workers without handling requests. When a worker process exits unexpectedly, the master process creates a new worker to minimize disruption.
The worker process, acting as the data plane, handles real business requests. Since each worker is single-threaded, there’s no overhead from thread synchronization. Each HTTP request is handled by a single worker process throughout its lifecycle, avoiding inter-process communication and data synchronization overhead.
This "master/worker" model is the foundation of Nginx's high reliability and performance.
2.2 Event-Driven Architecture
A high-performance network server needs to handle various events like network, disk, and timer events, with network I/O being the most critical. Traditional network programming models often use a "one-thread-per-connection" or "one-process-per-connection" approach, where threads or processes consume network events. However, Nginx employs a fully event-driven architecture to handle its workload:
Nginx’s event-driven framework collects and dispatches network, timer, and other events.
Business modules act as event consumers, registering their interest in specific events in advance.
When an event occurs (e.g., a readable/writable network connection), the event-driven framework invokes the appropriate event consumer, allowing the business module to execute.
Depending on the operating system, Nginx selects the optimal multiplexing interface, such as epoll on Linux.
Nginx’s event-driven architecture significantly boosts network performance and throughput, but it requires business modules to avoid blocking behavior during execution, as this would prevent other events from being processed promptly, reducing the worker process’s overall throughput.
2.3 Excellent Modular Design
In Nginx, almost everything is a module. Nginx defines highly abstract interfaces for "modules", requiring all modules to follow the "ngx_module_t" interface specification. Additionally, modules are categorized and layered. Here’s a brief overview of Nginx’s module design:
Nginx’s framework directly defines "core modules" and "configuration modules." Configuration modules handle basic configuration parsing, while core modules form the foundation of the Nginx framework, interacting solely with the core modules.
The "ngx_http_module" is the core module responsible for HTTP functionality. It defines a new module type, the HTTP module type. All HTTP modules must adhere to the "ngx_http_module_t" interface specification.
Among all HTTP modules, the "ngx_http_core_module" is special, implementing the most fundamental HTTP logic.
Nginx’s excellent modular design ensures that despite its complexity, the codebase remains well-organized. Developers can also extend Nginx by developing their own modules based on these interfaces.
2.4 Other Notable Features
As a high-performance server, Nginx boasts numerous impressive design features, including:
Written in C, Nginx implements its own data structures like double-ended queues, red-black trees, and hash tables, as the C standard library doesn’t provide them.
Nginx has its own memory pool, efficiently managing memory resources while reducing the cognitive load on developers.
Nginx offers the "ngx_buf_t" data structure for efficient buffer operations.
Nginx supports various load balancing algorithms, such as round-robin and hash-based algorithms, to distribute requests among a group of upstream servers.
3. SafeLine's t1k Module
Given Nginx’s strength as a reverse proxy server, SafeLine uses Nginx as the WAF’s traffic forwarding engine, handling HTTP traffic. However, beyond forwarding, a WAF’s core function is to inspect traffic. SafeLine has developed an HTTP module within Nginx called the t1k module, which sends traffic to SafeLine’s detection engine and decides whether to block it (with a 403 response) or allow the request to proceed.
Thanks to Nginx’s customizable development capabilities, implementing these functions in Nginx is straightforward. The SafeLine's t1k module works as follows:
During Nginx’s access phase of processing HTTP requests, the module performs request inspection by generating a subrequest, which is redirected to a special internal location, usually named "@safeline."
This special location’s handler, when processing the subrequest, uses Nginx’s upstream mechanism to connect to SafeLine’s detection engine and send the content for inspection.
The inspection results are parsed, and the subrequest processing ends.
The original request flow resumes, and based on the inspection results, t1k decides whether to return a 403 page or forward the request to the next processing stage.
Response inspection is achieved using Nginx’s HTTP body filtering mechanism.
Thanks to Nginx’s highly customizable and dynamic module-loading capabilities, the t1k module can also be integrated into existing Nginx/OpenResty servers (including API gateway products like Kong and APISIX, which are based on OpenResty). This forms the basis of our "embedded deployment mode".
Posted on August 23, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
September 13, 2024