Data Size Real World

koukikitamura

koki kitamura

Posted on February 20, 2022

Data Size Real World

Applications are currently used by people all over the world. The applications have the following features.

  • Have millions of users
  • Store a large amount of data that amounts from petabytes to exabytes(TB-EB)
  • Require performance from ms to μs
  • Handle millions of requests per second

The number TB-EB may be difficult to understand because it is a number of digits that is not often used in daily life.

Hardware is always required to run the software. It is necessary to select the hardware that is suitable for the features of the software to be run. Understanding the amount of data will help you choose the memory and storage of your hardware. Hardware selection ranges from on-premises server devices and virtual server instance types to PCs used at home.

Today, the advent of the cloud computing blurs the border between application engineers and infrastructure engineers. Many teams of application engineers build infrastructure using the cloud. Understanding the amount of data makes you choose the right hardware for yourself.

In this article, we'll give you a sense of the amount of data by showing the amount of data of various things.

Data size unit

The unit of data size is a byte. Currently, 1 byte is defined as 8 bits.

Since the data size handles a large number of digits, add a prefix to omit the number of digits. In the International System of Units(SI), the prefix is as follows:

Symbol Name Factor Power EN
k kilo 10^3 10^3 thousand
M mega 10^3k 10^6 million
G giga 10^3M 10^9 billion
T tera 10^3G 10^12 trillion
P peta 10^3T 10^15 quadrillion
E exa 10^3P 10^18 quintillion

On the other hand, the data size prefix is as follows:

Symbol Name Factor
B byte 8bit
KB kilo byte 1024B
MB mega byte 1024KB
GB giga byte 1024MB
TB tera byte 1024GB
PB peta byte 1024TB
EB exa byte 1024PB

The data is treated as a binary number in computer, so the prefix is every 1024, which is the 8th power of 2, instead of every 1000. There is also a notation that uses KiB instead of KB to distinguish it from the International System of Units, but this article uses KB.

Website data size

According to Page Weight, the data size of a website component is as follows:

Name Size
Total 1.96MB
HTML 31.4KB
CSS 68.9KB
JavaScript 452KB
Font 119KB
Image 956KB
Video 2.07 MB

The target period for aggregation is from January 2017/1 to 2022/1, and the target is mobile sites.

Note that it is the total size per page, not the size per file. Some people may find the size of JavaScript larger than expected. The reason is that it includes not only own code but also the code of external packages such as frameworks and libraries.

File size

The size varies depending on the contents of the file, so it is for your reference. The size and format were converted from the reference as needed. Also, these data are not a comparison of good and bad file formats. This is because the appropriate file format depends on the features of the file.

Name Size cf.
Image - small JPG (size 320 x 320) 21.0KB https://www.instagram.com/p/CZZyelKpmYf/
Image - small PNG (size 320 x 320) 137KB https://www.instagram.com/p/CZZyelKpmYf/
Image - small WebP (size 320 x 320) 16KB https://www.instagram.com/p/CZZyelKpmYf/
Image - large JPG (size 1036 x 1036) 187KB https://www.instagram.com/p/CVoHltuF7_e/
Image - large PNG (size 1036 x 1036) 1.38MB https://www.instagram.com/p/CVoHltuF7_e/
Image - large WebP (size 1036 x 1036) 148KB https://www.instagram.com/p/CVoHltuF7_e/
Audio - music MP3 (playback time 3:01) 5.80MB https://pixabay.com/music/beats-dont-you-think-lose-16073/
Movie - short MP4 720p (playback time 0:09) 851KB https://www.youtube.com/shorts/Wm3F8kF9WAE
Movie - short WebM 720p (playback time 0:09) 1.10MB https://www.youtube.com/shorts/Wm3F8kF9WAE
Movie - short GIF 720p (playback time 0:09) 3.50MB https://www.youtube.com/shorts/Wm3F8kF9WAE
Document - PDF (4 pages) 150KB -
Document - DOC (4 pages) 100KB -
Document - XLSX (1000 rows) 140KB -
Document - PPT (3 pages) 248KB -
Application - Firefox 97.0.1 (Mac) 364MB https://www.google.com/chrome/
Application - Discord 0.0.265 (Mac) 193MB https://discord.com/
Application - Zoom 5.1.1 (Mac) 52.5MB https://zoom.us/
Application - Xcode 13.2.1 (Mac) 32.1GB https://developer.apple.com/xcode/

Hardware capacity

The data size of the hardware memory and storage are following table. If there is no standard, the value is shown as a guide.

Name Size
Memory - AWS EC2 instance t2.micro 1GB
Memory - AWS EC2 instance T2 0.5GB ~ 32GB
Memory - AWS EC2 instance M5 5GB ~ 384GB
Memory - MacBook Pro 13 inch 2020 8GB ~ 16GB
Memory - MacBook Pro 14 inch 2021 16GB ~ 64GB
Memory - iPhone (1st generation) 128MB
Memory - iPhone (13 Pro max) 6GB
Storage - AWS EBS Provisioned HDD 125GB ~ 16TB
Storage - AWS EBS Provisioned IOPS SSD 4GB ~ 16TB
Storage - AWS RDS SSD 20GB ~ 64TB
Storage - MacBook Pro 13 inch 2020 SSD 256GB ~ 2TB
Storage - MacBook Pro 14 inch 2021 SSD 1TB ~ 8TB
Storage - iPhone (1st generation) 4 ~ 16GB
Storage - iPhone (13 Pro max) 128GB ~ 1TB
Storage - Floppy Disk 720KB ~ 1.44MB
Storage - Compact Disk 650 ~ 700MB
Storage - DVD 4.7GB ~ 8.5GB
Storage - Blu-ray 25GB ~ 128GB
Storage - USB memory 32GB ~ 256GB

Real application data volume

According to Data Never Sleeps, the data created by the actual application is following:

Name Volume per minute Volume per day Volume per year
Twitter tweet 575K tweet/min 828M tweet/day 302G tweet/year
Instagram photo 65K photo/min 93.6M photo/day 34.2G photo/year
Slack message 148K message/min 213M message/day 77.8G message/year

Next, let's look at it in bytes. Assuming to tweet on Twitter has 100 characters and one character is 1 byte, 1tweet is 0.1KB. Instagram 1photo is assumed to be 0.1MB. Assuming to message in Slack has 50 characters and one character is 1 byte, 1message is 0.05KB. Based on these things, it is as follows.

Name Size per year
Twitter tweet 30.2TB/year
Instagram photo 3.42PB/year
Slack message 3.89TB/year

If you operate the service for one year, more than terabytes of data are accumulated. It is difficult to store this amount of data on a single database server, and you need to scale it out and store it on a distributed database server. We call this a distributed database.

There are two ways to build a distributed database: Master/Slave method and partitioning method. The Master/Slave method is an approach to high traffic, not a large amount of data volume. We should use partitioning for a large amount of data.

RDBs are not designed for partitioning. Therefore, maintaining the partitioned RDB is costly. If you want to build a partitioned distributed database, consider a database designed for partitioning, such as DynamoDB.

💖 💪 🙅 🚩
koukikitamura
koki kitamura

Posted on February 20, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related