Migrating a WordPress blog from subdirectory to subdomain without loosing URL structure with Nginx
Bouchaala Reda
Posted on August 18, 2022
If you just want to look at the full Nginx config used jump to the end of the post.
From an SEO perspective using a subdirectory or subdomain for your blog/site is a subject of debate. Have a look at this article for an example.
The goal of this post however, is to help you to physically move your WordPress site from your subdirectory to a subdomain without actually having to change your URL structure (i.e. still looks like it's a subdirectory).
The reasons why you'd want to do this may vary. Here are mine:
- We had a WP blog that lived in the same repository as the main website (let's call it
site.com
). - In order for the SEO team to manage the blog, they had to go through the tech team every time they needed something changed/added.
- We have a lot of blog traffic, and we didn't want to lose any traffic nor SEO credit by moving everything from
site.com/blog
toblog.site.com
.
So the requirements for this migration are now clear:
- Move the WP blog from the
site.com
repository to a managed WordPress hosting provider like HostGator, Hostingr... etc. - The blog under
site.com/blog
should still be accessible by visitors as normal, and it should serve the new managed blog atblog.site.com
. -
blog.site.com
MUST NOT be accessible directly by visitors, only viasite.com/blog
. - Blog traffic must be served with HTTPS.
Requirement number 1 is quite straightforward. We are left with requirements 2 to 4.
The way I did was by adding an Nginx reverse proxy on the main website (site.com) to serve the blog contents from blog.site.com
as if it was hosted under site.com/blog
. So let's make a start on that.
Here's a diagram of how things work. We'll dive into the Nginx config next.
Nginx Reverse Proxy Config
Basic reverse proxy config looks like this
# Any URL path that starts with blog will be using this config block.
location /blog {
# Request paths coming into our main site will be /blog/something
# But we want to send requests to blog.site.com as /something
# So we use rewrite to strip /blog/ from the request path
rewrite /blog/(.*) /$1 break;
proxy_pass http://blog.site.com;
}
This is the base config we can work with. Nginx basically catches any request made to /blog
and fetches the contents from blog.site.com
. A reverse proxy at its simplest form.
Problem 1: We're not using HTTPS
But as you can see, we're using HTTP and not HTTPS and that does not fulfill requirement number 4. So let's configure Nginx to use HTTPS.
Let's configure Nginx to use HTTPS & secure it using HTTP Basic Auth.
location /blog {
# ...
rewrite /blog/(.*) /$1 break;
# SSL config for proxy
proxy_ssl_server_name on; # 1
proxy_ssl_session_reuse on; # 2
proxy_set_header Host blog.site.com; # 3
proxy_set_header X-Forwarded-Proto https; # 3
proxy_set_header X-Forwarded-Port 443; # 3
proxy_set_header X-Real-IP $remote_addr; # 3
proxy_set_header X-Forwarded-Host $host; # 3
proxy_set_header Authorization "Basic {CREDENTIALS}"; #4
# Proxy
proxy_pass https://blog.site.com;
}
Here's an explanation of the directives we added to our config:
-
proxy_ssl_server_name on;
Will force Nginx to use TLS SNI (Server Name Indication) which is required in this case because we are trying to serve two different websites with two different SSL certificates in one server using one IP address. With this directive, Nginx knows which SSL certificate to use. Note that Support for SNI was introduced in Nginx 1.7.0. So make sure you're using 1.7.0+. -
proxy_ssl_session_reuse on;
Will re-use the previous negotiated connection to do an abbreviated SSL handshake which is better than doing a full handshake each time we try to connect, the latter is CPU intensive. This is a performance improvement. - The
proxy_set_header
directive is used to set some required and informational headers to be sent along with the request toblog.site.com
. Eg: setting the correct Host header. - Here we set the Authorization header to a Basic (HTTP Basic Auth). The username/password need to be configured at
blog.site.com
level (if you're using managed WP hosting, they'll definitely have an HTTP Basic Auth section somewhere in the site config).CREDENTIALS
is just a placeholder, replace it with a base64 encoding ofusername:password
.
Our Nginx config is now ready, and it will start serving the blog traffic from blog.site.com
as requested.
Problem 2: Incorrect links in blog pages
We're now faced with another problem. The WP blog in blog.site.com
will have all page links point to blog.site.com/page-url
exposes our managed blog and breaks our requirements.
That means that when a visitor first opens our blog, everything will look good, but whenever the visitor clicks on any link on the page, they'll be redirected to blog.site.com/page-url
. Definitely not what we want.
Fortunately, Nginx can help us with that as well. The solution is basically to use Nginx's ngx_http_sub_module which will help us modify the response from blog.site.com
by replacing string occurrences with other ones, before sending it to the visitor. Let's see how we might do that by adding to our previous Nginx config
location /blog {
# ...
rewrite /blog/(.*) /$1 break;
# SSL config for proxy
proxy_ssl_server_name on; # 1
proxy_ssl_session_reuse on; # 2
proxy_set_header Host blog.site.com; # 3
proxy_set_header X-Forwarded-Proto https; # 3
proxy_set_header X-Forwarded-Port 443; # 3
proxy_set_header X-Real-IP $remote_addr; # 3
proxy_set_header X-Forwarded-Host $host; # 3
proxy_set_header Authorization "Basic {CREDENTIALS}"; #4
# 5
proxy_set_header Accept-Encoding "";
# 6
sub_filter_once off;
sub_filter_last_modified on;
sub_filter_types text/html text/css text/xml text/javascript application/json;
sub_filter 'blog.site.com' 'site.com/blog';
# 7
sub_filter 'src="/wp-content/' 'src="/blog/wp-content/';
# 8
sub_filter 'http:' 'https:';
# Proxy
proxy_pass https://blog.site.com;
}
Let's explain what we added there:
- Disable response compression which is required to be able to change the response.
- We substitute every occurrence of
blog.site.com
in the response withsite.com/blog
on all HTML, CSS, JS, XML & JSON response types. - We prefix absolute asset URLs sent by WP (
/wp-content/
) with/blog/
- We just replace all insecure links with secure ones so that we don't get any browser errors, since we are using HTTPS on our main site and also between
site.com
&blog.site.com
.
Problem 3: Incorrect blog redirects
The last problem we have is a tricky one to solve. The only good (I say good here because I probably could've used if
directive, but we know that it causes problems when used in a location block) solution I found to work is kind of a hack, so if you have better ideas please let me know in the comments.
The problem is, whenever the blog actually returns a redirect response, the link in the Location
header will be incorrect in some cases. Sometimes the blog returns relative URLs without https://
(eg: /faq-page), and sometimes it returns absolute URLs that are complete and start with https://
. We only need to replace Location
header links if they relative links and not absolute ones.
With the help of Nginx's ngx_http_map_module and the map
directive (think of it as a simple switch/case statement), we can create a dynamic variable (Its value depends on other values/variables) that will be hold the prefix that we need to add to the Location
header link.
Add this section before the server
block of your Nginx config:
map $upstream_http_location $_upstream_http_location_prefix { # 1
default $upstream_http_location; # 2
"~^/" "/blog"; # 3
"~*^http" ""; # 4
"~*^((?!http|\/).)*" "/blog/"; # 5
}
Let's explain what each line does:
- The first line uses the
map
directive to create a dynamic variable called$_upstream_http_location_prefix
whose value depend on$upstream_http_location
variable. The latter holds theLocation
value sent by the upstream (ourblog.site.com
). - The default value of the variable shall be the
Location
header value itself. This is just to be on the safe side, although this is probably never going to happen because our case statements are quite mutually exclusive. - If the
Location
header value starts with/
, then the prefix will be/blog
. - If the
Location
header value starts withhttp
then the prefix will be empty. - If
Location
header value neither starts with/
norhttp
then the prefix will be/blog
.
Now that we have our Location
header prefix ready, we can use it in out location /blog
block as so.
location /blog {
# ...
proxy_hide_header Location; # 1
add_header Location "$_upstream_http_location_prefix$upstream_http_location"; # 2
# Proxy
proxy_pass https://blog.site.com;
}
- We first hide the original
Location
header sent to us by the upstream (our blog). - Add a new
Location
header whose value isOUR_CALCULATED_PREFIX + ORIGINAL VALUE
and by doing that we effectively re-wrote the header value depending on what it starts with.
Problem 4: Visitors can access WP admin login via /blog
I really didn't want the admin login page to be accessible via site.com/blog
so I think we'll be better off if we just hide it completely. Anyone who's interested in logging in to WP admin site need to go to blog.site.com
, login using HTTP Basic Auth then login using his/her WP account credentials.
We can easily do that by adding a couple of location blocks
# Return the blog's 404 page when accessing WP login/admin
location /blog/wp-admin { return 301 /blog/404; }
location /blog/wp-login.php { return 301 /blog/404; }
# Prevent the blog's robots.txt from being proxied
location /blog/robots.txt { return 404; }
location /blog {
# ...
}
Final Nginx config
Here's the full Nginx config if you want to copy and paste the whole thing.
Before the server
block, add this:
map $upstream_http_location $_upstream_http_location_prefix { # 1
default $upstream_http_location; # 2
"~^/" "/blog"; # 3
"~*^http" ""; # 4
"~*^((?!http|\/).)*" "/blog/"; # 5
}
Then inside the server
block of you site add this. Make sure to replace site.com
with your website URL.
# Return the blog's 404 page when accessing WP login/admin
location /blog/wp-admin { return 301 /blog/404; }
location /blog/wp-login.php { return 301 /blog/404; }
# Prevent the blog's robots.txt from being proxied
location /blog/robots.txt { return 404; }
location /blog {
# strip /blog/ from the request path
rewrite /blog/(.*) /$1 break;
# SSL config for proxy
proxy_ssl_server_name on; # 1
proxy_ssl_session_reuse on; # 2
proxy_set_header Host blog.site.com; # 3
proxy_set_header X-Forwarded-Proto https; # 3
proxy_set_header X-Forwarded-Port 443; # 3
proxy_set_header X-Real-IP $remote_addr; # 3
proxy_set_header X-Forwarded-Host $host; # 3
# Set HTTP Basic auth for connecting to the blog.
proxy_set_header Authorization "Basic {CREDENTIALS}"; #4
# Correct Location header (if present).
proxy_hide_header Location;
add_header Location "$_upstream_http_location_prefix$upstream_http_location";
# Disable response compression so we can change it.
proxy_set_header Accept-Encoding "";
# Change the response's links to correct ones.
sub_filter_once off;
sub_filter_last_modified on;
sub_filter_types text/html text/css text/xml text/javascript application/json;
sub_filter 'blog.site.com' 'site.com/blog';
sub_filter 'src="/wp-content/' 'src="/blog/wp-content/';
sub_filter 'http:' 'https:';
# Proxy
proxy_pass https://blog.site.com;
}
This Nginx config is what something I worked on and is operational at time of writing this post. The blog being served by this Nginx reverse proxy averages 1.2+ million unique visitors per month, and is doing just fine. So I can safely say that this solution is well tested in the real world.
That's it, thanks for reading the article and make sure to drop a comment if you have any questions or feedback!
Posted on August 18, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.