Accreditly
Posted on July 20, 2023
In this guide we're going to discuss how to migrate a live website to a new infrastructure with no downtime and without losing data.
Website being migrated
We'll assume in this example that we're dealing with a busy ecommerce website. The ecommerce website processes around 500 orders a day, so downtime is costly, and data is constantly moving.
Problem areas of a migration
It's important to understand the potential issues with any migration. These include:
- Data in your database (the most challenging aspect)
- Your code and changes
- Environment files and variables not in version control
- Files uploaded in your application
- DNS changes
Current architecture
Let's start by outlying our current infrastructure. We'll try and keep it simple. If you have more complex infrastructure then it's likely this guide will still work, or you'll understand the nuances enough to translate it to your own use case.
Our current infrastructure consists of a single app server, which contains our web server (Apache or nginx) and a database server (MySQL). This guide will work perfectly fine if you're on immutable architecture (eg. multiple web servers, load balancer and single database).
User uploaded files
Files uploaded by users are currently handled by S3, so there isn't much for us to worry about there as that will continue to happen. On any solution I always recommend using an external service for housing user uploaded files as it is cheaper, more reliable, easier to implement, easier to migrate, and you won't be plagued with disk space issues.
However, we'll discuss how you can migrate files as part of this guide.
The migration
OK, let's begin.
User uploaded files
In our app we use S3 for files, so really this doesn't need much attention at all, but if you're not using a third party to handled user uploaded files then you'll need to move them there.
Unfortunately, although this process is reasonably straightforward, it will require changes to your application code. There are ways to do it without changing code (such as rsyncing to the service), but they aren't recommended.
1. Make changes to your codebase to use the service's endpoint for files.
With this we'll assume we're using AWS S3. You'll need to make changes to your app's code to point at S3 rather than your local files. Obviously the steps to do this vary depending on your own application. I'd also recommend putting Cloudfront in front of your S3 bucket for both speed and cost savings.
You should put this change behind a feature flag. A feature flag is a concept used in software development to enable and disable features of an app very quickly. In simple terms it's a config file full of booleans. In our case it allows us to quickly switch from using local files to S3 files by changing a single variable.
Once it's all implemented and tested, leave the feature turned off so we're still serving local files.
2. Move existing files to S3.
Next we need to move all of our existing files to S3. The easiest way to do this is using AWS' CLI client.
First off set up an IAM user who has read-write access to a bucket on S3.
Then install AWS CLI on your server.
Then you can sync all of your files:
aws s3 sync source_folder s3://your_bucket_name/destination_folder/
Please note: You don't have to use AWS S3. There are now lots of other services available, which are often cheaper than S3, that use the same suite of APIs as S3 so they're drop-in replacements.
3. (Optional) set up Cloudfront
Cloudfront is a CDN. It has many uses, but for us it sits in front of S3 and serves files for us.
You can configure Cloudfront to run as 'origin pull', where it will pull files with the same path requested directly from S3, and then cache them indefinitely.
For example, let's say you have a file:
https://example-s3-bucket.com/images/myfile.jpg
Once configured you can instead request:
https://my-example-app.cloudfront.net/images/myfile.jpg
On the first request Cloudfront will download the file to the CDN, cache it, then serve it to the user. Subsequent requests will serve the cached file.
AWS have a guide on setting this up.
4. Ensure future files are uploaded to S3
You'll need to change your application's code to ensure future files are uploaded to S3.
You have multiple options here, you can opt to upload files directly to AWS S3 from a HTML form, or you can upload them to your server and sync from there.
There is no right or wrong approach, it depends on your circumstances.
Ensure this feature also has its own feature flag, or ties into the original feature flag from step 1.
5. Go live with S3 files
Once tested and you're happy you can resync all of your files (step 2: aws s3 sync source_folder s3://your_bucket_name/destination_folder/
), and then go live.
Going live is as simple as changing the feature flag to true.
You can always re-sync after going live, just in case any files were uploaded during the last sync prior to going live.
You can get this done prior to the rest of the migration.
DNS preparation
OK, so that's user files taken care of, now we need to prep DNS.
DNS is a subject that is simple for the basics, but extremely complex when you dig into it. Fortunately we're sticking to the basics.
DNS is just a list of signposts for services. When someone's browser requests example.com their computer will ask the nameserver (DNS server) for that domain what IP address example.com points to. It's as simple as that.
A DNS record consists of:
- Record type (A, AAAA, CNAME, TXT, MX, etc)
- Record target (
example.com
,www.example.com
,whatever.example.com
) - Record value (IP address for a lot of records, a string for others)
- TTL (time to live. Effectively the amount of time to cache the record)
During the migration we're going to change the record value for both our root domain (example.com) and www.
record.
In preparation for this we need to do 2 things:
1. Ensure www.
is a CNAME for the root domain
Assuming this works for your infrastructure, you should ensure the www.
record of your domain DNS is a CNAME that points back to the root domain:
CNAME www.example.com example.com
This is generally considered best practice, simply because you only need to change one record to move the domain which reduces chances of human error, or extremely rare issues like your DNS provider having an outage half way through editing 2 fields.
Doing this ahead of time gives your DNS time to propagate to providers.
2. Reduce your TTL
The TTL of your records is effectively the time DNS providers cache the record.
During normal activities you are safe to set this as a reasonably high value such as an hour, or a day (in seconds). However, during a migration we want DNS providers to update as fast as possible, so it's a good idea to update this to a much lower value.
Most DNS providers have a lower cap on what they'll allow, but you should be able to set it to something like 30
, which is 30 seconds.
Doing this ahead of time allows DNS providers to update their systems accordingly.
Remember though, TTL is a guidance you provide and it's up to DNS providers to actually take note of it. Many ISPs, especially residential ISPs who provide DNS services to their customers, will heavily cache records regardless of your TTL.
Set up your new infrastructure and code freeze
At this point you'll want to set up your new infrastructure and ensure everything works as expected.
You can test it using a hosts file entry. Getting this all set up is outside the scope of this guide.
Once set up you should have the following in place:
- Old infrastructure still in place.
- Files on old infrastructure being uploaded to a third-party such as S3.
- DNS prepared for fast changes.
- New infrastructure in place, but not used yet.
At this point you need to implement a code freeze. No more code changes can be made, and no deployments should be made.
Migrate and sync data
The most challenging part of moving a live website that transacts is the database. The data is changing with every transaction, and your application is probably writing to the database frequently.
There are a number of solutions to moving your data without causing any data loss. We'll be focusing on two solutions here (there are others).
1. Migrate and remote connect
The first and simplest option is to use a remote connection. Read on for how this works, but be aware that you could be unlucky and have a small amount of "data loss" (it is recoverable) with this approach, so it's important you understand the risk involved.
OK, so this approach involves setting up your new infrastructure to allow remote connections. You then manually migrate your data from old infrastructure (create an export, transfer it to new server, import it). Once migrated you change the connection details on the old infrastructure to point to the database on the new infrastructure.
At this point your visitors will be using the web server on the old infrastructure, but the database server on the new infrastructure. This allows you to then make DNS changes and it doesn't really matter how long it takes to propagate.
There are risks. Running an export, transferring it, and importing again takes time. How much time depends on the amount of data and the resources available on all machines involved. If you don't have a particularly busy website then this is probably OK.
If the worst should happen and your website makes a sale during the migration then the data isn't lost, it will be sat on the old database. Depending on the complexity of your data schema it might be straightforward to manually move the missing rows to your new database.
2. Create a master/slave setup
MySQL has a concept called master/slave, something that was popular with older hard drives that were prone to failure.
Effectively the master database acts as you would expect it to, all writes go into it, and any data being read by the application is read from there.
The slave is never read from, but any writes that happen also happen to the slave. You effectively get a real-time copy of the database that is constantly being updated as changes happen.
The concept here is to set up the database on the old infrastructure as the master, and set up the database on the new infrastructure as the slave. Once you're happy data is being written to both you can simply switch roles so the new infrastructure becomes the master.
This approach seems much simpler, and won't incur any data loss. However, setting up remote master/slave instances isn't as straightforward as it may seem and it adds a level of complication that may not be favourable.
Although this solution is housed entirely in MySQL it is possible to do it within the application itself. Many application frameworks support master/slave setups, where you're able to set up a single 'read' source with multiple write destinations (or multiple reads to load balance your databases). You may find it easier to implement the master/slave setup in your application rather than in MySQL.
DigitalOcean have a great guide on MySQL replication.
Environment files & tertiary setup
By this point you should have tested your application on the new infrastructure many times, so it should be obvious whether anything isn't working. However, a few gotchas that can come up when moving between infrastructure to check at this point:
- Ensure you have all environment variables set up on new infrastructure (
.env
file or equivalent). - Ensure you have a migration process in place for deployments. This differs massively depending on your setup, but make sure when you make the change you aren't deploying code changes to old infrastructure.
- Ensure your scheduler or equivalent is setup and running on your new infrastructure.
- Ensure any queue runners and daemons are setup and running on your new infrastructure.
Go live
So at this point you should have:
- Identical code on old and new infrastructure
- User uploaded files on a third party like S3
- Database is now on new infrastructure
- Code is still being run on old infrastructure
- DNS is prepared with low TTL and CNAME on
www.
- Tested new infrastructure thoroughly
All that is left to do is to change your DNS record on the root domain to point to the new infrastructure. You should start to see traffic within a minute or two, and within 24 hours (remember, not all DNS providers follow your TTL) all traffic should be hitting the new infrastructure.
The benefits of this piecemeal approach to the migration is that if you encounter an issue at any point, you can easily roll it back. So after moving DNS, if the new infrastructure catches fire, you can just switch the DNS back and you shouldn't encounter any data loss.
Summary
OK so let's go through what we did there from a high level.
- Ensured we're using immutable user-uploaded files. If not, migrate to S3 or similar.
- Prepped DNS.
- Migrated data.
- Began using database on new infrastructure, with a failover back to old infrastructure.
- Switched DNS to new infrastructure.
At any point in this whole process we have a live backup of the previous setup to revert to. For example, if you switch the database to new infrastructure and it fails, you can switch to old without any data loss (especially if you use master/slave), or if you switch DNS and something fails you can easily switch back with no data loss or downtime.
An exact migration plan will differ depending on your application's specifics though. More complex applications may also have additional requirements that need handling, such as:
Key-value storage for sessions or caching. If you have this then you'll need to understand how you're going to invalidate sessions upon moving infrastructure. You don't want to destroy people's cart sessions as they're about to transact.
Queue and job processing. Many applications offload tasks to a queue to be processed. There are many solutions for handling this, but if you're handling it all within your own infrastructure then you'll need to ensure this is also moved without any interruption. There are many options here, but most applications support multiple queues, so it should be trivial to pause the queue processing, migrate to new infrastructure, and then process it there.
Webhooks. Webhooks shouldn't be affected by the migration, because both infrastructure should be able to handle inbound web requests within the same database environment at all times. But if the worst happens and you encounter any issues that result in downtime you will need to have a recovery process in place to handle missed webhooks. Most services will retry failed webhooks after a period of time, and will offer a way to manually fire them at a later point should you need to, just remember to check them.
Other services. There's no one-size-fits-all migration plan. You know your application, if there's something not considered here then you'll need to ensure it's handled properly.
Hopefully this is of use to you. If you have any questions or suggestions, feel free to leave a comment.
Posted on July 20, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.