My AWS Notes for Certification
Anshuman Abhishek
Posted on March 20, 2021
Three Types of services in cloud:
SaaS - Gmail, DropBox
PaaS - Node.JS SDK, JAVA SDK environment where we just upload the software
Iaas - Full services like load balancer etc
RDS is PaaS. AWS itself is IaaS as it provides for automated deployment of servers, CPUs, storage, and networking (i.e. the hardware). RDS, Beanstalk, Aurora, etc. are all PaaS.
AWS is costly, that's why startups are moving to DigitalOcean and Linode.
AWS using Xen Hypervisor very majorly and then after KVM
On some point we can't increase resources vertically like RAM and CPU.
So, Horizontal Scaling is preferable on this scenarios. In AWS auto scaling feature we give the parameter like cpu utilization reach to 70%, do horizontal scaling. It supports 1-5 VMS right now.
"gartner magic quadrant " some companies decide to choose with their data.
Availability Zone is the DR site of the data. In, Mumbai there are 2.
MFA Two-factor authentication by using authenticar ext of chrome or mobile.
AWS root user has unlimited access. So, we make IAM users with limited access.
If the user compromised, delete the previllages of that.
By Wireshark Software, you can check which ports are open. And the threeway handshake.
In the ACL, lower the number higher the priority it take. By default, 100 and * are present. Here 100 is lower and applicable.
Object Strorage vs Block Storage : In object storage, every files are as object and you can open it in the browser. But block storage is a premitive storage.
Object storage is not as much fast as block storage.
If we select instance-store, its hd size is fixed but the storage is free. Instance-store is temporary storage. Underline storage fails, or instance stop/terminate - we lost the whole data. So, generally we backup the data to s3 when use instance-store.
EBS - Elastic Block Store is most reliable.
Elactic Load Balancer(ELB) - Distribute the nw traffic to multiple EC2 instances. NW, Application, and Classical.
AWS Route53 is a managed cloud DNS service on the cloud. We don't really have to managed like premitive DNS server using cli and make entry one by one.
It is a global service and not specific to one region.
AWS Lambda took it one step further with event driven service. When configured event occurs, the lambda function runs. The user only charged based on compute time your function consumed.
If company has 200GB log each day, now they prefer S3 bucket.
Glacier is used for long term span like 10yrs.
Lifecycle Versioning:
1st 3 month log S3,
After 3month S3 Infrequent Access
After a year Glacier.
Provisioned IOPS SSD is much faster.
No-sql is schema free, horizontal scaling, easy replication and manage huge amount of data. Formats: Document DB, graph store, key-value store and wide-column store.
In AWS, Dynamo DB is No-sql DB.
Cloud-watch is monitoring tool.
SNS stands for simple notification service. For msgs and mobile notifications service.
Lambda - Is a serverless which does't mean it does't has server but we not have to manage servers.
We don't really take care of Servers, capacity Needs, Deployment, Scaling & HA and OS updates, security. But only take care of Bring your code and pay for what you use. Never Pay for idling resource.
You can use AWS Lambda to run your code in response to events, such as changes to data in an Amazon S3 bucket or an Amazon DynamoDB table; to run your code in response to HTTP requests using Amazon API Gateway; or invoke your code using API calls made using AWS SDKs. With these capabilities, you can use Lambda to easily build data processing triggers for AWS services like Amazon S3 and Amazon DynamoDB, process streaming data stored in Kinesis, or create your own back end that operates at AWS scale, performance, and security.
Reverse Proxy by Nginx - DDOS Protection
It is type of proxy server which retrieves resource on behalf of a client from one or more servers. It sits in between client and Application servers.
Its features are:
-It hides the existence of the original backend servers.
-can protect the back-end servers from web-based attacks, DOS and many more.
-can provide great caching functionality
-can optimize the content by compressing it.
-can act as a SSL Termination proxy
-request Routing and many more
CDN - Content Delivery Network
Like we put static files on Nginx server for faster delivery and all other features.
CDN do the same thing.
There are 2 CDN - CloudFlair and CloudFront.
Edge location - used to deilver content from nearest server. It helps to reduce latency. It has 50 Edge Location around the world. In India there is 3 Edge Location.
If we type curl -i , then it basically type headers of the html
Amazon Rekognition
Elastic Beanstalk where we can directely upload the applications
CodeCommit is a git
aws configure
git clone
echo "hello, world" > test
git status
git add test
git commit -m "Adding file"
git push origin master
Business Intelligence and Data Warehouse is like a google analytics.
-number of pageviews
-session based on country or device
-from where the traffic is coming(direct, google,bing, email,quora etc)
-session duration
-age and gender of the user visiting
Relational Database(OLTP)
-Contains latest set of data
-useful in running the business
-generally used for read and write
-number of records typically accessd are limited, like tens or twenties
Data Warehouse-
-Contains the historical data
-useful in analysing the business
-generally used of reading operations
-number of records accessed can be in millions
awscalculator.com
CloudTrail provide you the logs
For data level you can apply S3 and Lambda
Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS. Amazon Inspector automatically assesses applications for exposure, vulnerabilities, and deviations from best practices
DOS and DDOS attack generally make cpu utilization higher.
You can do DDOS attack by software Low Orbit Ion Cannon on kali linux.
Thousands of request are throwing with this.
The biggest DDoS attack to date took place in February of 2018. This attack targeted GitHub, a popular online code management service used by millions of developers. At its peak, this attack saw incoming traffic at a rate of 1.3 terabytes per second (Tbps), sending packets at a rate of 126.9 million per second.
For Load Blanacer, we have to create two VPC one is public and one is private. Then within the VPC create two subnets in two different zones. After create subnet, go to route table and change the subnet also in there. While defining the Load Balancer you have to define the two subnets you created. Then define health check like - http - 80 - ping - /index.html
For Mongo replication, define in rs.config - then ips and ports or at the time of start the service of mongo define the rs config
White Box Testing: It is also called as Glass Box, Clear Box, Structural Testing. White Box Testing is based on applications internal code structure. In white-box testing, an internal perspective of the system, as well as programming skills, are used to design test cases. This testing is usually done at the unit level.
Black Box Testing: It is also called as Behavioral/Specification-Based/Input-Output Testing. Black Box Testing is a software testing method in which testers evaluate the functionality of the software under test without looking at the internal code structure.
Grey Box Testing: Grey box is the combination of both White Box and Black Box Testing. The tester who works on this type of testing needs to have access to design documents. This helps to create better test cases in this process.
Lambda - Pay only when code execute, no need to manage
Amazon Route 53 (Route 53) is a scalable and highly available Domain Name System (DNS) service.
A top-level domain (TLD) is one of the domains at the highest level in the hierarchical Domain Name System of the Internet.The top-level domain names are installed in the root zone of the name space. For all domains in lower levels, it is the last part of the domain name, that is, the last label of a fully qualified domain name. For example, in the domain name www.example.com, the top-level domain is com. Responsibility for management of most top-level domains is delegated to specific organizations by the Internet Corporation for Assigned Names and Numbers (ICANN), which operates the Internet Assigned Numbers Authority (IANA), and is in charge of maintaining the DNS root zone.
Record - A, AAA (ipv4 and ipv6), CNAME (Canonical Name record is a type of resource record in the Domain Name System (DNS) which maps one domain name to another), CAA(DNS Certification Authority Authorization (CAA) is an Internet security policy mechanism which allows domain name holders to indicate to certificate authorities whether they are authorized to issue digital certificates for a particular domain name. It does this by means of a new "CAA" Domain Name System (DNS) resource record), MX record(A mail exchanger record (MX record) specifies the mail server responsible for accepting email messages on behalf of a domain name. It is a resource record in the Domain Name System (DNS). It is possible to configure several MX records, typically pointing to an array of mail servers for load balancing and redundancy),
Amazon Simple Notification Service(SNS) - It provides a low-cost infrastructure for the mass delivery of messages, predominantly to mobile users. SNS uses the publish/subscribe model for push delivery of messages. Recipients subscribe to one or more 'topics' within SNS. A ticket booking app could use it for confirmation vouchers, boarding passes or notifications of a delay to a flight. Costs (2016) are quoted as $1.00 to send one million mobile notifications.
EC2
General Purpose: A1, T3, T2, M5, M5a, M4, T3a
Compute Optimized: C5, C5n, C4
Memory Optimized: R5, R5a, R4, X1e, X1, High Memory, z1d
Accelerated Computing: P3, P2, G3, F1
Storage Optimized: H1, I3, D2
Spot Instance - Amazon EC2 Spot instances are spare compute capacity in the AWS cloud available at up to 90% discount compared to On-Demand prices. As a trade-off, AWS offers no SLA on these instances and customers take the risk that it can be interrupted with only two minutes of notification when Amazon needs the capacity back.
Persistent storage - instance-store or EBS. EBS supports a number of advanced storage features, including snapshotting and cloning. EBS volumes can be attached or detached from instances while they are running, and moved from one instance to another. Amazon does not charge for the bandwidth for communications between EC2 instances and S3 storage.
Apache Hadoop supports a special s3: filesystem to support reading from and writing to S3 storage during a MapReduce job. There are also S3 filesystems for Linux, which mount a remote S3 filestore on an EC2 image, as if it were local storage. As S3 is not a full POSIX filesystem, things may not behave the same as on a local disk (e.g., no locking support).
Elastic IP - Amazon's elastic IP address feature is similar to static IP address in traditional data centers, with one key difference. A user can programmatically map an elastic IP address to any virtual machine instance without a network administrator's help and without having to wait for DNS to propagate the binding. In this sense an Elastic IP Address belongs to the account and not to a virtual machine instance. It exists until it is explicitly removed, and remains associated with the account even while it is associated with no instance.
CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, and visualizes it using automated dashboards so you can get a unified view of your AWS resources, applications, and services that run in AWS and on-premises. You can correlate your metrics and logs to better understand the health and performance of your resources. You can also create alarms based on metric value thresholds you specify, or that can watch for anomalous metric behavior based on machine learning algorithms. To take action quickly, you can set up automated actions to notify you if an alarm is triggered and automatically start auto scaling, for example, to help reduce mean-time-to-resolution. You can also dive deep and analyze your metrics, logs, and traces, to better understand how to improve application performance.
Amazon CloudWatch is a web service that provides real-time monitoring to Amazon's EC2 customers on their resource utilization such as CPU, disk, network and replica lag for RDS Database replicas.[46] CloudWatch does not provide any memory, disk space, or load average metrics without running additional software on the instance. Since December 2017 Amazon provides a CloudWatch Agent for Windows and Linux operating systems included disk and previously not available memory information, previously Amazon provided example scripts for Linux instances to collect OS information. The data is aggregated and provided through AWS management console. It can also be accessed through command line tools and Web API's, if the customer desires to monitor their EC2 resources through their enterprise monitoring software. Amazon provides an API which allows to operate on CloudWatch alarms.
**The metrics collected by Amazon CloudWatch enables the auto-scaling feature to dynamically add or remove EC2 instances. The customers are charged by the number of monitoring instances.
Automated scaling - Amazon's auto-scaling feature of EC2 allows it to automatically adapt computing capacity to site traffic. The schedule-based (e.g. time-of-the-day) and rule-based (e.g. CPU utilization thresholds) auto scaling mechanisms are easy to use and efficient for simple applications. However, one potential problem is that VMs may take up to several minutes to be ready to use, which are not suitable for time critical applications. The VM startup time are dependent on image size, VM type, data center locations, etc
Amazon CloudFront is a content delivery network (CDN) offered by Amazon Web Services. Content delivery networks provide a globally-distributed network of proxy servers which cache content, such as web videos or other bulky media, more locally to consumers, thus improving access speed for downloading the content.
CloudFront competes with larger content delivery networks such as Akamai and Limelight Networks.
AWS Elastic Beanstalk is an orchestration service offered by Amazon Web Services for deploying applications which orchestrates various AWS services, including EC2, S3, Simple Notification Service (SNS), CloudWatch, autoscaling, and Elastic Load Balancers. Elastic Beanstalk provides an additional layer of abstraction over the bare server and OS; users instead see a pre-built combination of OS and platform, such as "64bit Amazon Linux 2014.03 v1.1.0 running Ruby 2.0 (Puma)" or "64bit Debian jessie v2.0.7 running Python 3.4 (Preconfigured - Docker)". Deployment requires a number of components to be defined: an 'application' as a logical container for the project, a 'version' which is a deployable build of the application executable, a 'configuration template' that contains configuration information for both the Beanstalk environment and for the product. Finally an 'environment' combines a 'version' with a 'configuration' and deploys them. Executables themselves are uploaded as archive files to S3 beforehand and the 'version' is just a pointer to this.
AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code.
The purpose of Lambda, as compared to AWS EC2, is to simplify building smaller, on-demand applications that are responsive to events and new information. AWS targets starting a Lambda instance within milliseconds of an event. Node.js, Python, Java, Go[2], Ruby,[3] and C# (through .NET Core) are all officially supported as of 2018. In late 2018, custom runtime support was added to AWS Lambda, giving developers the ability to run a Lambda in the language of their choice.
AWS Lambda supports securely running native Linux executables via calling out from a supported runtime such as Node.js For example, Haskell code can be run on Lambda.
AWS Lambda was designed for use cases such as image or object uploads to Amazon S3, updates to DynamoDB tables, responding to website clicks or reacting to sensor readings from an IoT connected device. AWS Lambda can also be used to automatically provision back-end services triggered by custom HTTP requests, and "spin down" such services when not in use, to save resources. These custom HTTP requests are configured in AWS API Gateway, which can also handle authentication and authorization in conjunction with AWS Cognito.
Unlike Amazon EC2, which is priced by the hour but metered by the second, AWS Lambda is metered in increments of 100 milliseconds. Usage amounts below a documented threshold fall within the AWS Lambda free tier - which does not expire 12 months after account signup, unlike the free tier for other AWS services.
Each AWS Lambda instance is a container created from Amazon Linux AMIs (a Linux distribution related to RHEL) with 128-3008 MB of RAM (in 64 MB increments), 512 MB of ephemeral storage (available in /tmp, the data lasts only for the duration of the instance, it gets discarded after all the tasks running in the instance complete) and a configurable execution time from 1 to 900 seconds. The instances are neither started nor controlled directly. Instead, a package containing the required tasks has to be created and uploaded (usually) to an S3 bucket and AWS is instructed (via Amazon Kinesis, DynamoDB or SQS) to run it when an event is triggered. Each such execution is run in a new environment so access to the execution context of previous and subsequent runs is not possible. This essentially makes the instances stateless, all the incoming and outgoing data needs to be stored by external means (usually via S3 or DynamoDB, inbound connections to the instances is disabled). The maximum compressed size of a Lambda package is 50 MB with the maximum uncompressed size being 250 MB.
Amazon Virtual Private Cloud (VPC) is a commercial cloud computing service that provides users a virtual private cloud, by "provision[ing] a logically isolated section of Amazon Web Services (AWS) Cloud". Enterprise customers are able to access the Amazon Elastic Compute Cloud (EC2) over an IPsec based virtual private network. Unlike traditional EC2 instances which are allocated internal and external IP numbers by Amazon, the customer can assign IP numbers of their choosing from one or more subnets. By giving the user the option of selecting which AWS resources are public facing and which are not, VPC provides much more granular control over security. For Amazon it is "an endorsement of the hybrid approach, but it's also meant to combat the growing interest in private clouds".
Amazon Virtual Private Cloud aims to provide a service similar to private clouds using technology such as OpenStack or HPE Helion Eucalyptus. However, private clouds typically also use technology such as OpenShift application hosting and various database systems. Cloud security experts warned there can be compliance risks, such as a loss of control or service cancellation in using public resources which do not exist with in house systems. If transaction records are requested from Amazon about a VPC using a National security letter they may not even be legally allowed to inform the customer of the breach of the security of their system. This would be true even if the actual VPC resources were in another country.[7] The API used by AWS is only partly compatible with that of HPE Helion Eucalyptus and is not compatible with other private cloud systems so migration from AWS may be difficult. This has led to warnings of the possibility of lock-in to a specific technology.
Initially, users are able to choose a range of IP addresses for their VPC. Within this range, users can assign various private and public IPv4 and IPv6 addresses to instances in the VPC in order to communicate with the Internet and other instances of VPCs. These addresses are assigned to specific instances rather than the user's entire VPC account.[9] Static assignment of Public IP addresses is not possible, instead the address is assigned and unassigned in certain cases, causing the address of an instance to change. When a consistent IP address is needed, a third type of IP Address, Elastic IP addresses, can be used in place of Public IP addresses.
AWS VPC allows users to connect to the Internet, a user's corporate data center, and other users' VPCs.
Users are able to connect to the Internet by adding an Internet Gateway to their VPC, which assigns the VPC a public IPv4 Address.
Users are able to connect to a data center by setting up a Hardware Virtual Private Network connection between the data center and the VPC. This connection allows the user to "interact with Amazon EC2 instances within a VPC as if they were within [the user's] existing network."
Users are able to route traffic from one VPC to another VPC using private IP addresses, and are able to communicate as if they were on the same network. Peering can be achieved by connecting a route between two VPC's on the same account or two VPC's on different accounts in the same region. VPC Peering is a one-to-one connection, but users are able to connect to more than one VPC at a time.
Security
AWS VPC's security is two-fold: firstly, AWS VPC uses security groups as a firewall to control traffic at the instance level, while it also uses network access control lists as a firewall to control traffic at the subnet level. As another measure of privacy, AWS VPC provides users with ability to create "dedicated instances" on hardware, physically isolating the dedicated instances from non-dedicated instances and instances owned by other accounts.[non-primary source needed]
AWS VPC is free, with users only paying for the consumption of EC2 resources. However, if choosing to access VPC via a Virtual Private Network (VPN), there is a charge.
Amazon Relational Database Service (Amazon RDS) is a web service that makes it easier to set up, operate, and scale a relational database in the AWS Cloud.
Amazon RDS database engines
Amazon Aurora, PostgresSQL, MySQL, mariaDB, Oracle and SQLServer.
Amazon Aurora is a MySQL and PostgreSQL-compatible relational database built for the cloud, that combines the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases.
Amazon Aurora is up to five times faster than standard MySQL databases and three times faster than standard PostgreSQL databases. It provides the security, availability, and reliability of commercial databases at 1/10th the cost. Amazon Aurora is fully managed by Amazon Relational Database Service (RDS), which automates time-consuming administration tasks like hardware provisioning, database setup, patching, and backups.
Amazon Simple Queue Service (Amazon SQS) - It supports programmatic sending of messages via web service applications as a way to communicate over the Internet. SQS is intended to provide a highly scalable hosted message queue that resolves issues arising from the common producer-consumer problem or connectivity between producer and consumer.
Amazon SQS guarantees at-least-once delivery. Messages are stored on multiple servers for redundancy and to ensure availability. If a message is delivered while a server is not available, it may not be removed from that server's queue and may be resent.
Kafka, as you wrote, is a distributed publish-subscribe system. It is designed for very high throughput, processing thousands of messages per second. Of course you need to setup and cluster it for yourself. It supports multiple readers, which may "catch up" with the stream of messages at any point (well, as long as the messages are still on disk). You can use it both as a queue (using consumer groups) and as a topic.
An important characteristic is that you cannot selectively acknowledge messages as "processed"; the only option is acknowledging all messages up to a certain offset.
SQS/SNS on the other hand:
no setup/no maintenance
either a queue (SQS) or a topic (SNS)
various limitations (on size, how long a message lives, etc)
limited throughput: you can do batch and concurrent requests, but still achieving high throughputs would be expensive
I'm not sure if the messages are replicated; however at-least-once guarantee delivery in SQS would suggest so
SNS has notifications for email, SMS, SQS, HTTP built-in. With Kafka, you would probably have to code it yourself
no "message stream" concept
So overall I would say SQS/SNS are well suited for simpler tasks and workloads with a lower volume of messages.
API - Amazon provides SDKs in several programming languages including Java, Ruby, Python, .NET, PHP and JavaScript. A Java Message Service (JMS) 1.1 client for Amazon SQS was released in December 2014.
Authentication
Amazon SQS provides authentication procedures to allow for secure handling of data. Amazon uses its Amazon Web Services (AWS) identification to do this, requiring users to have an AWS enabled account with Amazon.com; this can be created at Amazon Web Services (AWS) - Cloud Computing Services. AWS assigns a pair of related identifiers, your AWS access keys, to an AWS enabled account to perform identification. The first identifier is a public 20-character Access Key. This key is included in an AWS service request to identify the user. If the user is not using SOAP (protocol) with WS-Security, a digital signature is calculated using the Secret Access Key. The Secret Access Key is a 40-character private identifier. AWS uses the Access Key ID provided in a service request to look up an account's Secret Access Key. Amazon.com then calculates a digital signature with the key. If they match then the user is considered authentic, if not then the authentication fails and the request is not processed.
SNS - Publisher / Subscriber System - Publishing messages to a topic can deliver to many subscriber (fan out) of different types (SQS, Lambda, Email)
SQS - Queuing service for message processing. A system must poll the Queue to discover new events. Messages in the queue are typically processed by a single consumer.
When you order something(it is a event), then it trigger by SNS Topic and send to Lamda function where it customize the raw data get additional out and format it and send email/sms to customer.
And also that event is send to SQS analytics queu by SNS, and it send to subscriber EC2 instance.
There may be another queue in SQS called Froud Detection Queue.
"
Kafka, as you wrote, is a distributed publish-subscribe system. It is designed for very high throughput, processing thousands of messages per second. Of course you need to setup and cluster it for yourself. It supports multiple readers, which may "catch up" with the stream of messages at any point (well, as long as the messages are still on disk). You can use it both as a queue (using consumer groups) and as a topic.
An important characteristic is that you cannot selectively acknowledge messages as "processed"; the only option is acknowledging all messages up to a certain offset.
SQS/SNS on the other hand:
no setup/no maintenance
either a queue (SQS) or a topic (SNS)
various limitations (on size, how long a message lives, etc)
limited throughput: you can do batch and concurrent requests, but still achieving high throughputs would be expensive
I'm not sure if the messages are replicated; however at-least-once guarantee delivery in SQS would suggest so
SNS has notifications for email, SMS, SQS, HTTP built-in. With Kafka, you would probably have to code it yourself
no "message stream" concept
So overall I would say SQS/SNS are well suited for simpler tasks and workloads with a lower volume of messages.
"
In SNS, subscription options are HTTP/S, Email, Email-JSON, SQS, Lambda, Platform application endpoints, SMS.
Posted on March 20, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024