AWS S3 DEMYSTIFED

What is S3?

S3 is a object based where individual files can be of from 0bytes to 5TB.There is unlimited storage in S3 and are stored in buckets.Naming of the bucket should be unique globally as it is a universal namespace.S3 as if the data stored in s3 are successfully uploaded it return status code of 200.The objects in the S3 will have key value pair.Amazon S3 is a simple key-based object store. When you store data, you assign a unique object key that can later be used to retrieve the data. Keys can be any string, and they can be constructed to mimic hierarchical attributes. Alternatively, you can use S3 Object Tagging to organize your data across all of your S3 buckets and/or prefixes.

What are S3 objects?

In S3 all the files are stored as objects.The objects will have :
*Key : Value
*Version ID
*Metadata
*Subresources : Access Control Lists

But in S3 we have a condition where if you are writing a new file and you can read it immediately but if you are overwriting a file it might take sometime to propagate and get a new data this is also the case when you delete a file

What does are the benefits of S3?

*S3 guarantees 11*9 for durability for its service.
*It has tiered storage available with seven tires to choose from depending on your needs.
*It has life cycle management you can also set lifecycle expiration policies to automatically remove objects based on the age of the object.
*It has versioning.When analyzing the storage costs of the operations, note that the 4 GB object from Day 1 is not deleted from the bucket when the 5 GB object is written on Day 15. Instead, the 4 GB object is preserved as an older version and the 5 GB object becomes the most recently written version of the object within your bucket.
*It has encryption.You can choose to encrypt data using SSE/S3, SSE/C, SSE/KMS, or a client library such as the Amazon S3 Encryption Client. All four enable you to store sensitive data encrypted at rest in Amazon S3.
It has access control lists(file level) and bucket policy for (bucket level)security.

What are the various storage tiers for S3?

*s3 standard
*s3 IA : Infrequently Accessed
*s3 one zone - IA : Infrequently Accessed One Zone only.
*s3 - intelligent tiering (reduced redundancy storage) : Auto Tier by machine learning our usage
*s3 glacier : for very infrequent accessed information
*s3 glacier deep archive : for archiving datas that are not being changed and are not needed rapidly.
*s3 outposts : S3 Outposts storage class to store your S3 data on-premises.Amazon S3 on Outposts delivers object storage in your on-premises environment, using the S3 APIs and capabilities that you use in AWS today. AWS Outposts is a fully managed service that extends AWS infrastructure, AWS services, APIs, and tools to virtually any datacenter, co- location space, or on-premises facility.

What are the conditions for charges of S3?

They are charged for with the selection of storage tier and follows :
*REQUESTS(transfer in - transfer out)
*STORAGE MANAGEMENT PRICING : you can use a single Amazon S3 bucket to store a mixture of S3 Glacier Deep Archive, S3 Standard, S3 Standard-IA, S3 One Zone-IA, and S3 Glacier data. Also S3 Object Tagging are a part of storage management.
*DATA TRANSFER PRICING(in-out)
*Versioning
*Location
*We measure storage usage in “TimedStorage-ByteHrs,” which are added up at the end of the month to generate your monthly charges.
*Assume you store 100GB (107,374,182,400 bytes) of data in Amazon S3 Standard in your bucket for 15 days in March, and 100TB (109,951,162,777,600 bytes) of data in Amazon S3 Standard for the final 16 days in March.
At the end of March, you would have the following usage in Byte-Hours: Total Byte-Hour usage = (107,374,182,400 bytes x 15 days x 24 hours / day)] + 109,951,162,777,600 bytes x 16 days x 24 hours / day)] = 42,259,901,212,262,400 Byte-Hours.
Let's convert this to GB/Months: 42,259,901,212,262,400 Byte-Hours / 1,073,741,824 bytes per GB / 744 hours per month = 52,900 GB/Months
This usage volume crosses two different volume tiers. The monthly storage price is calculated below assuming the data is stored in the US East Northern Virginia) Region: 50 TB Tier: 51,200 GB x $0.023 = $1,177.60 50 TB to 450 TB Tier: 1,700 GB x $0.022 = $37.40
Total Storage Fee = $1,177.60 + $37.40 = $1,215.00

How is the security being handled in S3?

Customers may use four mechanisms for controlling access to Amazon S3 resources: Identity and Access Management (IAM) policies, bucket policies, Access Control Lists (ACLs), and Query String Authentication. IAM enables organizations with multiple employees to create and manage multiple users under a single AWS account. With IAM policies, customers can grant IAM users fine-grained control to their Amazon S3 bucket or objects while also retaining full control over everything the users do. With bucket policies, customers can define rules which apply broadly across all requests to their Amazon S3 resources, such as granting write privileges to a subset of Amazon S3 resources. Customers can also restrict access based on an aspect of the request, such as HTTP referrer and IP address. With ACLs, customers can grant specific permissions (i.e. READ, WRITE, FULL_CONTROL) to specific users for an individual bucket or object. With Query String Authentication, customers can create a URL to an Amazon S3 object which is only valid for a limited time.
By default all new created s3 are private there is no public access.We can setup access control to our bucket with : bucket policy : applied at bucket level which applies to all objects in the bucket written in json (policy generator tool) access control lists : applied at object level that we can apply to individuals or groups.
we can also configure access logs to see all the requests to the s3 bucket all (crud) operations being performed in our S3.
How does S3 handle Encryption
You can choose to encrypt data using SSE/S3, SSE/C, SSE/KMS, or a client library such as the Amazon S3 Encryption Client. All four enable you to store sensitive data encrypted at rest in Amazon S3.

So in transit while the requests are being generated S3 can enable encryption by forcing (SSL/TLS/HTTPS) to handle its encryption.

But at rest AWS handles encryption with its services using :
SSE/S3 provides an integrated solution where Amazon handles key management and key protection using multiple layers of security. You should choose SSE/S3 if you prefer to have Amazon manage your keys.

SSE/C enables you to leverage Amazon S3 to perform the encryption and decryption of your objects while retaining control of the keys used to encrypt objects. With SSE/C, you don’t need to implement or use a client-side library to perform the encryption and decryption of objects you store in Amazon S3, but you do need to manage the keys that you send to Amazon S3 to encrypt and decrypt objects. Use SSE/C if you want to maintain your own encryption keys, but don’t want to implement or leverage a client-side encryption library.

SSE/KMS enables you to use AWS Key Management Service (AWS KMS) to manage your encryption keys. Using AWS KMS to manage your keys provides several additional benefits. With AWS KMS, there are separate permissions for the use of the master key, providing an additional layer of control as well as protection against unauthorized access to your objects stored in Amazon S3. AWS KMS provides an audit trail so you can see who used your key to access which object and when, as well as view failed attempts to access data from users without permission to decrypt the data. Also, AWS KMS provides additional security controls to support customer efforts to comply with PCIDSS, HIPAA/HITECH, and FedRAMP industry requirements.

Handling CORS in S3?

CORS IN S3 is very simple as we can enable CORS using bucket policy and hence services trying to access the bucket will be allowed without and cross origin issues.

Using CLOUDFRONT for S3?

AWS CLOUDFRONT helps in making the S3 bucket available in all edge locations which will optimize the performance of the S3 where anyone can access the S3 bucket from anywhere in the globe and faster with cloudfront where as using cloudfront in s3 will make the bucket available in all the edge locations of Amazon Web Services and hence all the operations(CRUD) to be handled by S3 will improve as it communicates with the edge locations of AWS and then AWS will use its internal network to get the data if they are not cached.
*EDGE locations will cache your information
*Origin is all the aws services that will be needed to be access by the users origin can be s3,ec2
*Distribution : web for website / rtmp for audio/video
*Cloudfront is also used to accelerate the upload of files to s3.(Transfer Acceleration)
*Objects are cache for the life of the ttl(time to live-default:24hrs) you can set the ttl.
*As with ttl as before you data if new will be accessed ASAP but if you edit the data can be cached and to clear the cache you are allowed but you will be charged.
*You can clear the cache with invalidating objects

How to make the most of S3?

We can make the most of S3 with :
*use cloudfront
*using random prefix for keynames(hex hash)
*We can make 3500 put requests per second 5500 get requests so utilize to increase performace.
*CloudWatch Storage Metrics are enabled by default for all buckets, and reported once per day but you can configure it to conditions you want.
*You can use CloudWatch to set thresholds on any of the storage metrics counts, timers, or rates and trigger an action when the threshold is breached. For example, you can set a threshold on the percentage of 4xx Error Responses and when at least 3 data points are above the threshold trigger a CloudWatch alarm to alert a DevOps engineer.

LAB

Create two buckets
*In your bucket enable CORS to access data with these two buckets.
*In your bucket Add ACL and Bucket Policy.
*In your bucket Add Encryption
*Go to cloudfront and enable Cloudfront and Transfer Acceleration
*Go to cloudfront and add restrictions to restrict countries
*Go to cloudfront and use invalidating objects to clear cache
*In your bucket enable versioning to keep all version of your object
*In your bucket enable access logs
*Update your bucket policy to restrict access to come from only cloud front.
*Create iam user to access bucket from cloudfront
*In your cloudfront choose http methods
*In your cloudfront set minimum ttl (for fast changing objects up to date)
*In your cloudfront you can use signed URLs or Signed Cookies) for restricted content
*Enable AWS/WAF to protect the bucket from common web exploits that may affect availability, compromise security, or consume excessive resources.
**PLEASE GO THROUGH FOR MORE COMPREHENSIVE INFORMATION* https://aws.amazon.com/s3/faqs/