System Design: Google Drive That Scales To The Moon
Shreyansh sheth
Posted on May 8, 2023
Introduction
So a few months ago I was building a new internal project that helps you upload files like google drive but also works as a CDN so you can use those assets on the front end. So here is the few lessons that I have learned along the way.
Requirement
The project requirement is simply to create a multi-tenant app that will let you upload and download files in the workspace, ability to make the file public, and also make a file available through a single URL with the cache.
To get file URL will be something like /:workspaceid/:key
Technology Choices
Storage
For Storage there are two options mainly, S3 & R2 but I have selected S3 because it's what I have used and it is a more mature solution with most of the issues and solutions online.Functions
I have used the lambda function for it. because it works with dynamo DB that I am selecting for file metadataFile Metadata
for file metadata, we are storing the mime type, size, and original name of the file along with indexes that we will talk about later in the blog.Auth
For Authentication, We Used a simple cookie JWT-based system, and that won't matter much in the blog context.
NOTE: front-end was on react(vite).
Let's Start Designing
What We Need?
- to upload a file
- to delete the file
- make the file public
- make the file available via cdn with caching
- in ui workspace users can see all their files
Database Design
For file-metadata database design is very simple as it contains query keys as file key and workspaceId as sort and partition keys.
the reason that workspace-id is the partition key, is that will help make DynamoDB partition for each workspace and make it more efficient while we query for the list in frontend.
and file key is the sort key that just helps to make sorting possible with ULID(Lexicographically Sortable Identifier) as it makes a list on UI based on when it is uploaded and also it is unique so it helps query single files easily.
For ULID see this blog
Chart That Represents The DB Design
As you can see this is the metadata table for the files that we are uploading and this contains all important fields that are required for some or other purpose.
System Design
For System Design, It Contains Some Part Related To AWSLambda, S3, CloudFront, Lambda@Edge
File Upload Service
Takes File Type, Size, and Mimetype, save it into a dynamo, and Returns POST upload URL that helps upload the file directly to s3 from frontend with some max-age.
File Delete CRON
This function is a CRON (scheduled) that will delete files from the s3 periodically from deleteAfter & isDeleted as it uses GSI Field from table and also from DynamoDB. the ideal time for us is 4 days, so if any recovery comes we can resolve it as soon as possible.
Get Private File
It Looks At AUTH Token That Represents The Workspace ID and returns a pre-sign URL get URL that can help you view the private file
Make File Public
If the File Is Public It Will Be Served From Cloudfront With Its Caching Layer Also I Have Added Some Image Resizer As A Bonus Part There.
Here Is The Basic System Design Chart That Helps You Get a Top View Of The Things
The End
If You Like This Article, Please Consider Following Me On LinkedIn And If You Have Some Cool Project Like This Just Contact Me, I Would Love To Work With Awesome People And Projects.
Linkdin
Posted on May 8, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.