15 Things that you must know about AWS S3 (Simple Storage Service)

aziz_amghar

aziz.amghar

Posted on June 21, 2021

15 Things that you must know about AWS S3 (Simple Storage Service)

1. S3 is a secure and scalable storage service

You can store securely your files (called objects) to S3, the object size can be up to 5 TB.

2. Objects Attributes:
S3 objects can have:

  • Key (name of the object)
  • Value (data)
  • Version ID.
  • Metadata (data about data you are storing)
  • Subresources: Access control list, torrents.

3. S3 Naming convention:
There are some rules that you must respect in order to name your S3 objects:

  • No uppercase nor underscore
  • 3-63 characters long
  • Not an IP and it must start lowercase letter or number
  • S3 is a universal namespace, so it’s unique.

4. S3 has the following features:
Tiered storage available
Lifecycle management
Versionning
Encryption
MFA Delete (multi factor auth): can be only configured in CLI mode.
Secure data using ACL (Access Control List) and bucket policies.
Signed URLs: URLs that are valid only for a limited time (ex: premium video service for logged in users)

5. S3 storage classes:

  • S3 standard: 99.99% availability, 99.99999999% durability, it is the default storage class.
  • S3 IA (infrequently Accessed)
  • S3 one zone - IA
  • S3 Intelligent Tiering
  • S3 Glacier (for data archiving, 99.999999999% durability of archives )
  • S3 Glacier Deep Archive (retrieve data in 12hours)

S3 Pricing Tiers:
You pay per:

  • Storage
  • Requests and data retrieval
  • Data transfer

Most expensive: S3 standard, then followed by:

  • S3 IA
  • then S3 Intelligent Tiering
  • then S3 one zone IA
  • then S3 glacier
  • and finally S3 glacier deep archive.

6. S3 Encryption:

Two types of encryption:

  • Encryption in Transit: SSL/TLS
  • Encryption at Rest (server side), there are three types of server side encryption:
    • S3 managed keys -SSE -S3,
    • AWS Key Management Service,
    • Server side encryption with customer provided keys SSE-C
  • Then there is client side encryption

8. S3 Security:

  • User based: IAM policies.
  • Resource based, that can be managed in three ways:
  • Bucket policies, used to:
    • Grant public access to the bucket
    • Force a bucket to be encrypted at upload
    • Grant access to another account (Cross Account)
  • Object ACL,
  • Bucket ACL.

9. S3 CORS:

  • If you request data from another S3 bucket, you need to enable CORS.
  • Cross Origin Resource Sharing allows you to limit the number of websites that can request your files in S3, thus limit your costs.

10. Consistency Model

  • Read after write consistency for PUTS of new objects:
    • As soon as an object is written, we can retrieve it, ex: PUT 200 -> GET 200)
    • This is true, except if we did a GET before to see if the object existed (ex: GET 404 -> PUT 200 -> GET 404) – eventually consistent
  • Eventual Consistency for DELETES and PUTS of existing objects
    • If we read an object after updating, we might get the older version (ex: PUT 200 -> PUT 200 -> GET 200 (might be older version))
    • If we delete an object, we might still be able to retrieve it for a short time (ex: DELETE 200 -> GET 200)

11. S3 Access Logs:

  • For audit purpose
  • Any request made to S3, from any account, authorized or denied will be logged into another S3 bucket
  • That data ca be analyzed using data analysis tools like Athena.

12. S3 pre-signed URLs:

  • Can generate pre-signed URLs using SDK or CLI
  • For download (easy, can use the CLI)
  • For uploads (harder, must use the SDK)
  • Valid for a default of 3600s, can change timeout with –expires in {TIME_BY_SECONDS] argument
  • Users given a pre-signed URL inherit the permissions of the person who generated the URL for GET / PUT. Examples:
  • Allow only logged in users to download a premium video on your S3 bucket
  • Allow an ever changing list of users to download files by generating URLs dynamically
  • Allow temporarily a user to upload a file to precise location in our bucket

13. S3 Performance:

  • Baseline Performance:
    • S3 scale automatically to high request rates, latency 100-200ms
    • Your app ca achieve at least 3500 PUT/COPY/POST/DELETE and 5500 GET/HEAD requests per second per prefix in a bucket.
  • KMS Limitation:
    • If you use SSE-KMS, you may be imapcted by the KMS limits
    • When you upload, it call the GenerateDataKey KMS API
    • When you download, it calls the Decrypt KMS API
    • Count towards the KMS quota per second (5500, 10000, 3000 req/s based on region)
    • You cant request a quota increase for KMS
  • Multi Part upload:
    • Recommended for files > 100MB, must use for files > 5GB
    • Can help parallelize uploads (divied in parts and speed up transfers)
  • S3 Transfer Acceleration (upload only)

  • S3 Byte range Fetches

    • Parallelize GETs by requesting specific byte ranges
    • Better resilience in case of failures
    • Can be used to speed up downloads
    • Can be used to retrieve only partial data (for example the head of a file)

14. Select & Glacier Select:

  • Retreive less data using SQL by performing server side filtering
  • Can filter by rows & columns (simple SQL statements, server side filtering)
  • Less network transfer, less CPU cost client side.

15. Object & Glacier Vault Lock:
Alt Text

Do you know any other functionnality of S3 that I didn't mention, please feel free to post it in the comment.

💖 💪 🙅 🚩
aziz_amghar
aziz.amghar

Posted on June 21, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related