[Optimise] App size reduction by 22% using Computer Vision
ZhiHong Chua
Posted on November 1, 2023
How it began
In a team code review session, someone asked, "Why is this image here? I thought we have saved it elsewhere, why didn't we just use that?"
My thoughts
True, it seems possible it was saved elsewhere. And... if someone could make this mistake here, it's likely others did too.
Wait but why?
Our react-native app used to be a multi-bundle architecture that only loaded each bundle when required by the user, which was meant to optimise load times. Naturally, images were not shared between bundles. However, it resulted in plenty of bugs and random edge cases after taking into account Push Notifications, Deeplink and other things.
Today, we use pretty much monolith, but surprisingly, no one has done the optimisation to check for duplicate images!
Initial plan
It seemed perfect to use Computer Vision with openCV library to parse each pixels in the image into 0/1 and compare the pixelmap to see which images' ones were similar and report it. I was too lazy to try to construct the algorithm so I asked ChatGPT to do it:
Problem
After letting it run for a bit, I realised it was taking really long, so I added print logs. Considering there were 1033 images to compare, turns out it would take:
1033 * 1033 = 1,067,089 seconds = 296.41 hours = 12.35 days !!??!?!!??!
I figured the fact that they were comparing each image to all others was the problem. If 'n' denotes the length of images, this was a O(n^2) solution. We have to do faster.
Optimised plan
Rather than compare both, I decided I want a one-pass solution and just store the existing solution in a set / hashmap, and then for each image append to the set that has the same pixel-map value. It took me less than 30 seconds to write it up:
P.S Thanks to ChatGPT
How long did it take now?
5 seconds
Anyway, turns out the app size would be reduced a whopping
22.2%
Next Up:
All that's left was to get the go-ahead from team lead, and it would be scheduled for December!
- Short term: Manually check these duplicates and keep 1 copy, delete all others (just in case there are precision errors in the image-hash technique)
- Medium term: Also identify and remove images that are present but no longer imported by code.
- Long term: Add a check that runs on commits to warn of any duplicate image. Also include developer contact so any issues can be reported and rectified.
Long long term: Migrate these images to a CDN (edit 11 Nov: see part 2 on why this might be a bad idea)
References: https://apiumhub.com/tech-blog-barcelona/introduction-perceptual-hashes-measuring-similarity/
Posted on November 1, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.