Nginx: Everything about proxy_pass
Daniel Albuschat
Posted on August 20, 2019
With the advent of Microservices™, ingress routing and routing between services has been an every-increasing demand. I currently default to nginx for this - with no plausible reason or experience to back this decision, just because it seems to be the most used tool currently.
However, the often needed proxy_pass
directive has driven me crazy because of it's - to me unintuitive - behavior. So I decided to take notes on how it works and what is possible with it, and how to circumvent some of it's quirks.
First, a note on https
By default proxy_pass
does not verify the certificate of the endpoint if it is https (how can this be the default behavior, really?!). This can be useful internally, but usually you want to do this very explicitly. And in case that you use publicly routed endpoints, which I have done in the past, make sure to set proxy_ssl_verify
to on
. You can also authenticate against the upstream server that you proxy_pass
to using client certificates and more, make sure to have a look at the available options at https://docs.nginx.com/nginx/admin-guide/security-controls/securing-http-traffic-upstream/.
A simple example
A proxy_pass
is usually used when there is an nginx instance that handles many things, and delegates some of those requests to other servers. Some examples are ingress in a Kubernetes cluster that spreads requests among the different microservices that are responsible for the specific locations. Or you can use nginx to directly deliver static files for a frontend, while some server-side rendered content or API is delivered by a WebApp such as ASP.NET Core or flask.
Let's imagine we have a WebApp running on http://localhost:5000 and want it to be available on http://localhost:8080/webapp/, here's how we would do it in a minimal nginx.conf:
daemon off;
events {
}
http {
server {
listen 8080;
location /webapp/ {
proxy_pass http://127.0.0.1:5000/api/;
}
}
}
You can save this to a file, e.g. nginx.conf, and run it with
nginx -c $(pwd)/nginx.conf
.
Now, you can access http://localhost:8080/webapp/ and all requests will be forwarded to http://localhost:5000/api/.
Note how the /webapp/ prefix is "cut away" by nginx. That's how locations work: They cut off the part specified in the location
specification, and pass the rest on to the "upstream". "upstream" is called whatever is behind the nginx.
To slash or not to slash
Except for when you use variables in the proxy_pass
upstream definition, as we will learn below, the location and upstream definition are very simply tied together. That's why you need to be aware of the slashes, because some strange things can happen when you don't get it right.
Here is a handy table that shows you how the request will be received by your WebApp, depending on how you write the location
and proxy_pass
declarations. Assume all requests go to http://localhost:8080:
location | proxy_pass | Request | Received by upstream |
---|---|---|---|
/webapp/ | http://localhost:5000/api/ | /webapp/foo?bar=baz | /api/foo?bar=baz |
/webapp/ | http://localhost:5000/api | /webapp/foo?bar=baz | /apifoo?bar=baz |
/webapp | http://localhost:5000/api/ | /webapp/foo?bar=baz | /api//foo?bar=baz |
/webapp | http://localhost:5000/api | /webapp/foo?bar=baz | /api/foo?bar=baz |
/webapp | http://localhost:5000/api | /webappfoo?bar=baz | /apifoo?bar=baz |
In other words: You usually always want a trailing slash, never want to mix with and without trailing slash, and only want without trailing slash when you want to concatenate a certain path component together (which I guess is quite rarely the case). Note how query parameters are preserved!
$uri and $request_uri
You have to ways to circumvent that the location
is cut off: First, you can simply repeat the location in the proxy_pass
definition, which is quite easy:
location /webapp/ {
proxy_pass http://127.0.0.1:5000/api/webapp/;
}
That way, your upstream WebApp will receive /api/webapp/foo?bar=baz in the above examples.
Another way to repeat the location is to use $uri or $request_uri. The difference is that $request_uri preserves the query parameters, while $uri discards them:
location | proxy_pass | request | received by upstream |
---|---|---|---|
/webapp/ | http://localhost:5000/api$request_uri | /webapp/foo?bar=baz | /api/webapp/foo?bar=baz |
/webapp/ | http://localhost:5000/api$uri | /webapp/foo?bar=baz | /api/webapp/foo |
Note how in the proxy_pass
definition, there is no slash between "api" and $request_uri or $uri. This is because a full URI will always include a leading slash, which would lead to a double-slash if you wrote "api/$uri".
Capture regexes
While this is not exclusive to proxy_pass
, I find it generally handy to be able to use regexes to forward parts of a request to an upstream WebApp, or to reformat it. Example: Your public URI should be http://localhost:8080/api/cart/items/123, and your upstream API handles it in the form of http://localhost:5000/cart_api?items=123. In this case, or more complicated ones, you can use regex to capture parts of the request uri and transform it in the desired format.
location ~ ^/api/cart/([a-z]*)/(.*)$ {
proxy_pass http://127.0.0.1:5000/cart_api?$1=$2;
}
Use try_files with a WebApp as fallback
A use-case I came across was that I wanted nginx to handle all static files in a folder, and if the file is not available, forward the request to a backend. For example, this was the case for a Vue single-page-application (SPA) that is delivered through flask - because the master HTML needs some server-side tuning - and I wanted to handle nginx the static files instead of flask. (This is recommended by the official gunicorn docs.)
You might have everything for your SPA except for your index.html available at /app/wwwroot/, and http://localhost:5000/ will deliver your server-tuned index.html.
Here's how you can do this:
location /spa/ {
root /app/wwwroot/;
try_files $uri @backend;
}
location @backend {
proxy_pass http://127.0.0.1:5000;
}
Note that you can not specify any paths in the proxy_pass
directive in the @backend for some reason. Nginx will tell you:
nginx: [emerg] "proxy_pass" cannot have URI part in location given by regular expression, or inside named location, or inside "if" statement, or inside "limit_except" block in /home/daniel/projects/nginx_blog/nginx.conf:28
That's why your backend should receive any request and return the index.html for it, or at least for the routes that are handled by the frontend's router.
Let nginx start even when not all upstream hosts are available
One reason that I used 127.0.0.1 instead of localhost so far, is that nginx is very picky about hostname resolution. For some unexplainable reason, nginx will try to resolve all hosts defined in proxy_pass
directives on startup, and fail to start when they are not reachable. However, especially in microservice environments, it is very fragile to require all upstream services to be available at the time the ingress, load balancer or some intermediate router starts.
You can circumvent nginx's requirement for all hosts to be available at startup by using variables inside the proxy_pass
directives. HOWEVER, for some unfathomable reason, if you do so, you require a dedicated resolver
directive to resolve these paths. For Kubernetes, you can use kube-dns.kube-system here. For other environments, you can use your internal DNS or for publicly routed upstream services you can even use a public DNS such as 1.1.1.1 or 8.8.8.8.
Additionally, using variables in proxy_pass
changes completely how URIs are passed on to the upstream. When just changing
proxy_pass https://localhost:5000/api/;
to
set $upstream https://localhost:5000;
proxy_pass $upstream/api/;
... which you might think should result in exactly the same, you might be surprised. The former will hit your upstream server with /api/foo?bar=baz
with our example request to /webapp/foo?bar=baz
. The latter, however, will hit your upstream server with /api/
. No foo. No bar. And no baz. :-(
We need to fix this by putting the request together from two parts: First, the path after the location prefix, and second the query parameters. The first part can be captured using the regex we learned above, and the second (query parameters) can be forwarded using the built-in variables $is_args
and $args
. If we put it all together, we will end up with a config like this:
daemon off;
events {
}
http {
server {
access_log /dev/stdout;
error_log /dev/stdout;
listen 8080;
# My home router in this case:
resolver 192.168.178.1;
location ~ ^/webapp/(.*)$ {
# Use a variable so that localhost:5000 might be down while nginx starts:
set $upstream http://localhost:5000;
# Put together the upstream request path using the captured component after the location path, and the query parameters:
proxy_pass $upstream/api/$1$is_args$args;
}
}
}
While localhost is not a great example here, it works with your service's arbitrary DNS names, too. I find this very valuable in production, because having an nginx refuse to start because of a probably very unimportant service can be quite a hassle while wrangling a production issue. However, it makes the location directive much more complex. From a simple location /webapp/
with a proxy_pass http://localhost/api/
it has become this behemoth. I think it's worth it, though.
Better logging format for proxy_pass
To debug issues, or simply to have enough information at hand when investigating issues in the future, you can maximize the information about what is going on in your location
that uses proxy_pass
.
I found this handy log_format
, which I enhanced with a custom variable $upstream, as we have defined above. If you always call your variables $upstream in all your locations that use proxy_pass
, you can use this log_format
and have often much needed information in your log:
log_format upstream_logging '[$time_local] $remote_addr - $remote_user - $server_name to: $upstream: $request upstream_response_time $upstream_response_time msec $msec request_time $request_time';
Here is a full example:
daemon off;
events {
}
http {
log_format upstream_logging '[$time_local] $remote_addr - $remote_user - $server_name to: "$upstream": "$request" upstream_response_time $upstream_response_time msec $msec request_time $request_time';
server {
listen 8080;
location /webapp/ {
access_log /dev/stdout upstream_logging;
set $upstream http://127.0.0.1:5000/api/;
proxy_pass $upstream;
}
}
}
However, I have not found a way to log the actual URI that is forwarded to $upstream, which would be one of the most important things to know when debugging proxy_pass
issues.
Conclusion
I hope that you have found helpful information in this article that you can put to good use in your development and production nginx configurations.
Posted on August 20, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.