Discover Sclone - sync S3 and Swift storages across multiple cloud providers and locations

steeve

Steeve

Posted on September 11, 2023

Discover Sclone - sync S3 and Swift storages across multiple cloud providers and locations

Hi / Bonjour šŸ‘‹

Introduction

Let me introduce the latest tools we open-sourced and have been using for a couple of months: Sclone, for "Storage Clone", is a program to sync files to and from different cloud storage providers supporting S3 and Open Stack SWIFT.

Why? ~story mode~

In 2021, our cloud provider lost a data center due to a fire (I won't name it šŸ‘€), our APIs and application was impacted, slower or not responding, and most importantly, our clients were impacted. Hopefully, our API was serving from different locations. Following the event, we created a big master plan to make our infrastructure resilient for worst-case scenarios (like network down, DNS down, service down, and more...).

One part of the plan was to migrate from OpenStack Swift storage to S3, to improve our performance, and replicate buckets into multiple cloud providers for resiliency, such as:

šŸŖ£ (Main S3/Germany) <=>  šŸŖ£ (Clone S3/Paris) => šŸŖ£ (Clone S3/London)

# Legend:
# šŸŖ£  S3 Bucket
# <=> bidirectional sync
# =>  unidirectional sync
Enter fullscreen mode Exit fullscreen mode

If one S3 bucket is not accessible, our production application can switch automatically to a clone bucket at a different location/cloud provider.

We needed a tool for a smooth migration that could do a bidirectional synchronisation between a Swift Container and S3 buckets. In that way, we can migrate every version of our production API to the new S3 storage non-disruptively and with peace of mind (this is important).

The only viable solution was Rclone, but:

  • Unidirectional sync is slow for hundreds of Gigabytes.
  • Bidirectional sync is unmaintained and not recommended for production (related issue)

So, we develop our own tool: Sclone.

Benefits

  • High performances: File transfers are split into parallel queues. At the end of each process, the list of files is cached for better performance on the subsequent execution. (Benchmarks to learn more)
  • Two modes: Unidirectional or bidirectional sync.
  • Two storages supported (for the moment): S3 and Open Stack Swift.
  • Automation: Set a schedule to execute synchronisation automatically with the Cron syntax (optional).
  • File integrity: Files MD5 are checked during transfers.
  • Free & Open source šŸ«¶

Getting Start

  1. Download the latest binary from the release page.
  2. Create a file config.json near the binary to define the source/target storage credentials and the synchronisation mode. An example config is available at the root of the repository config.default.json.
  3. Enable options if needed, such as dryRun for the first execution. It will output (on a log file) all operations performed without carrying out those operations. Learn more about it on the configuration page.
  4. Finally start the synchronisation.
./sclone-1.0.0-linux
Enter fullscreen mode Exit fullscreen mode

Sync logic

First, Sclone will fetch the list of files from the source and target storages. Due to the S3/Swift API pagination, the process runs into two queues only.

In the second stage, the sync logic is computed based on the mode, the list of files, and the cache of the previous synchronisation.

In the last stage, the transfers is executed in parrallel:

  • Unidirectional mode: Sclone adds, updates, and deletes (if enabled) files from a source to a destination storage, based on files md5 hash.
  • Bidirectional mode: Two way synchronisation between a source and target storage, without deleting any files (unless delete option is enabled). Sclone compares files' both md5 and modification times. If the md5 is different, only the newest file is kept.

If the cron option is enabled, the sync will be re-executed based on the defined schedule. Following syncs are faster, it will process only the newest, updated or deleted files.

Under the hood

Sclone, made with Node.js, is leveraging two powerful packages we also open-sourced:

  • Tiny-storage-client: A node client to request a list of S3/Swift. If one request fails (Error 500, not accessible, etc...), the request is re-executed on the second S3 of the storage list. (Only two dependencies: rock-req and aws4 for signing requests).
  • Rock-req: It is the fastest and lightweight HTTP client for NodeJS, with reliable retries, and zero dependencies.

Binaries for MacOS and Linux are made with Pkg, which turns the Node.js project into an executable.

Conclusion

We succeed to migrate hundreds of GB from Swift Containers to multiple S3 buckets. Now, Sclone is still running to synchronise all our buckets! šŸŖ£ No more storage downtime, thanks to Tiny-storage-client, our applications can switch to any bucket if something goes wrong!

Feel free to try Sclone, I'll be happy to get feedback and answer your questions.

Leave a like to support my article or follow me to be notified of my next articles šŸ”„

Thanks for reading! Cheers šŸ»

šŸ’– šŸ’Ŗ šŸ™… šŸš©
steeve
Steeve

Posted on September 11, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related