File Storage Hustle
saisumith
Posted on May 13, 2024
A deep dive into file storage challenges and solutions.
Recently I was working on a freelance project, wherein my employer asked me to build a user profile page for a social media app. At first glance seemed like an easy task, just take some form values from the front-end and store them on a postgres database. But when it came down to the avatar it just made me realise how many edge cases you have to handle before safely storing it.
The internet is not a safe place, so you can’t just trust all your users and store information as is, who knows one miscalculation could corrupt your entire storage. This blog covers all the strategies that I employed to handle a safe file storage.
💡 sample code provided is written in golang, the fundamental concepts remain same for all languages.
Metadata
It’s crucial to know what file we are dealing with before performing any operations on it. The simplest indicator of file metadata would be the mime-type. Every request with the HTTP protocol has a mime-type associated to it, refer to MDN for common mime-types. They can be accessed in the request headers, an example of the famous Fiber framework in go is shown below:
There is a lot going on in this snippet but let’s take a look at the important parts.
- The file header was received when the c.FormFile method was called. This header stores metadata of the file.
- The mime-type was extracted from the file header.
- A brief check was performed on the mime-type to see if the file is of image type. In this case I checked for “image/jpeg” and “image/png” mime-types as I only permitted these files for the avatar.
You can also compute the mime-type of a file from the binary data using libraries like github.com/gabriel-vasile/mimetype etc. But I wouldn’t recommend it as in most cases you are probably using HTTP and would not need to spend the extra computation time.
Compression
The files that you receive from the internet could be extremely large, which could cost you a fortune to store and save on cloud. GCP has a pricing of $0.020 (per GB per month), so assuming you have 10K users and each user gives you a file of 1GB, you’re having to store 10K GB on cloud storage costing you $200 every month which is highly expensive. This is a far fetched example but explains the necessity of file compression.
File compression algorithms go deep with various patterns and ideals. Our main focus would be lossy compression as they are much more effective in shrinking down the file size. This specific case of avatars provokes lossy compression because we don’t need to have pixel perfect pictures as the images would be displayed on a small scale and resolution doesn’t need to be very high. Deep discussion on file compression is beyond the scope of this blog, but maybe in the future I will dedicate an entire blog post to just compression algorithms.
I highly discourage people from having to write their own compression algorithms as it requires significant knowledge and expertise, instead you could use already built tools like the libvips(c image processing library) which has bindings in several languages. I use a go library https://github.com/jamesponddotco/imgdiet-go for image compression, below is a snippet of the configurations that I’ve used.
The idea here is simple, the aim is to compress any image file to less than 150KB of size, so keeping that in mind I calculate a quality percentage and pass it as a parameter to the compression library.
Cloud Storage
Finally, it’s time to store our files and provide a publicly accessible URL. There are two ways to do this, first would be to manually store in your own file system by creating a file and writing binary to it. This is a bad idea as most likely you will be hosting you servers on the cloud services like google’s App Engine which gives you a limited storage. Better alternative is to use cloud storage options which are inherently much cheaper like gcp cloud storage, aws s3 etc. Almost all cloud storage providers use the concept of buckets and objects. You create a bucket, then store files in that bucket.
I was already using Supabase as a database storage solution therefore it was a no brainer to use Supabase’s storage that uses aws s3 underneath to store the files. Below is yet another code sample show casing the implementation.
In the above snippet, everything seems usual as in the documentation provided by supabase/storage-go but there is one tiny nuance. In the bucket and object type model if an object doesn't exist you send a POST request to create one, in this case to create a new avatar we have to use the UploadFile function. Contrary to that if the object already exists we should send a PUT request to update it, in this case we use the UpdateFile function.
Conclusion
Be careful with your storage and make sure you perform multiple validation checks before committing to store any information. The ideas discussed in this blog are based on modern web and popular implementations. As a beginner this should give you better insight into how to build your next app. Stay tuned for more such info!
Posted on May 13, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.