A Ton of Notes While Studying GCP

Hello, World...

So just a disclaimer on this... none of this will really get you ready for the GCP Certified Architect Exam... It's is really just some decent "Base" information. I just passed, but as much as you need to know, or not know, all of this stuff to various degrees, everything I read, watched, or was trained on: wasn't enough to prepare me for the test.

In a nutshell, you basically need to know all of this and more... and they may never ask you about any of it directly... or they may dive into the exact command line syntax for gsutil or kubectl.

I've also found so much conflicting information contained within Google and other practice exams and even various documentation, that it can be tough to know exactly what the truth is. It's all so hazy. So please take all of this with a grain of salt. But most of all, know how to build out scenarios from all you learn and to forecast into the future

A lot of this is in flight, so it may be outdated by the time I click publish. Heck, some of the material I was learning from was outdated when I got it. Good luck!

And so with that... I'll just be collecting together a boatload of tidbits I learned while studying for the GCP exam:

3 Recommended areas of console focus:

IAM (federation, quotas, roles)
Compute:
- App Engine
- Compute Engine
- Container Engine
- Cloud Functions
- Networking (VPC, VPN, CloudDNS, Routes, Firewall)
Storage
- Bigtable - NoSQL (HBase)
- Buckets!
- SQL & Spanner

Regions and Zones:

Region - independent geographic area that contains zones.
Zone - deployment area for your resources within a region.
Need to protect against a lost of an entire region, or zone.
Multi-zone, zone, and region would be the breakdown.
Not every product is available in every Americas, Europe, and Asia.
Resources are broken down into Zonal, Regional, and Multi-Regional.

IAM Quotas:

Unlimited or resources with quotas.
Know how to locate and address them.
Projects are going to have quotas.
- Resource quotas limit the number of resources created.
- Some resources have global and regional quotas.
- Most can be increased by request.
- API request quotas limit rate and daily volume.
- Free quotas allow free use up to limit.
- Example 5GB cloud storage and 50,000 entity reads.

Hierarchy in GCP:

Orgs - a TLD essentially.
- Can use G-Suite or cloud identity.
- Link your org domain.
- Can set access and configs at org or project level.
- Billing accounts, projects and resources are not deleted when an employee leaves the company.
- If not using G-Suite this orgs may do you no good.
Folders - introduced when you are using cloud IAM (comparable to AWS Directory Services).
- Assign policies to resources at a granularity of your choosing.
- Resources in a folder can share IAM policies.
- Below the Org and above projects categorizations.
Projects - pretty much to main interaction point.
- Resources go here for tracking
- Billing and accounting is usually tethered here too.
- Manage permissions and credentials.
- Enable services and APIs.
- The main goal is to provide a sandbox.
- You provide the project name
- Project ID - GCP provides, usually the AppID.
- GCP provides the project number.
- A network can only belong to one project.
- Default limit of 24 CPUs per project.
- API's are specific to resources in a given project.
- They have quotas.
- Project names must be between 4 and 30 characters.
- A Project can have up to 5 VPC networks.
Resources.

Compute Options (Heavily tested)

Compute Engine: These are VM's that are focused on your enterprise IaaS (Can be linux or windows server)

VMs
IaaS
Templates or custom
Cloud Launcher (marketplace)
vCPU and memory.
Networking
OS (Lin/Win)
TCP/UDP/IMCP
Note: Supports IPv4 only
Every VM instance belongs to a network.
Storage:
- Standard (zonal persistent)
- SSD (zonal persistent)
- Local SSD
- Can resize or migrate with no downtime.
If in the same network, they can communicate on the local lan.
If wanting to connect to the internet, we'll need to provision external IPs.
Global, Regional, & Zonal resources
- Global resources include pre-configured disk images, disk snapshots, and networks.
- Regional resources include static external IP addresses.
- Zonal resources include VM instances their types and disks.
Compute engine VM comes with a single root persistent disk
Image is loaded onto root disk during the boot process.
- Bootable - you can attach to a VM and boot from it.
- Snapshots - incremental backups.
- Durable - can survive VM terminate
- Some SW is installed and the OS is configured by GCE
Each persistent disk can be up to 64TB in size.
Each instance can attach only a limited amount of total persistent disk space and a limited number of persistent disks.
A single FS gives the best perf on a persistent disk.
Local SSDs = high IOPS and low latency.
You don't really need to know the speeds... but know the use cases... if you are using a local SSD, know why... persistent disk, know zonal and regional. etc.
Migrating VMs:
- Manual & Automatic
- Don't use on a VM with a local SSD
- The local SSD data cannot be backed up and will just be discarded.
- Persistent disks have to be attached to only the VM you are going to move (multiples are not supported)
- Sufficient quota must exist for all the resources copied during duplication, or the process will fail.
Snapshots:
- Snapshot is not available for local SSD
- Creates an incremental backup to GCS
- Snapshots can be restored to a new persistent disk
- Don't use for database migration across zones
- Cannot be shared across projects.
VM access - CLI linux - need FW rule port 22
- SSH (from console, CloudShell SDK, or computer.
- gcloud compute ssh (instance name)
VM access - windows RDP - need FW rule port 3389.
- poweshell pssession
- gcloud compute ssh (instance name)
Auto restart of VM. - what the VM should do after a hardware failure or system event. If marked, it will try and launch a replacement VM. Auto-restart does not automatically restart the VM if terminated due to a user event (shutdown or termination)
- Note: if the VM availability policy is set to the default, live migrate, during regular system maint, your VM will be migrated to diff hardware so there is not downtime.
Creating VM's - items to consider:
- Each zone supports a combination of processor generations.
- Creating an instance in a zone, your instance will use the default processor supported in that zone. Some support GPU, some TPU, etc.
  - ex: us-central1-a (Iowa) will have Sandy Bridge.
Instance Groups:
- uses an instance template to create or update the instances that are part of the group.
- Create template once and reuse it for multiple groups and configs.
- Template is global, and not bound to zone or region.
- You can specify a zonal resource in a template tho, and this will restrict it.
- By default, instances from the group will be placed in the Default and and randomly assign IPs from the Regional Range.
- Three types: Managed Regional, Managed Zonal, and Unmanaged
  - Unmanaged: dissimilar instances that you can arbitrarily add/remove from the group. These do not offer autoscaling, rolling updates, or use instance templates. Only use these if you have dissimilar types of instances or if you need load balancing to pre-existing configurations. Will support Linux variants and Win versions
  - Managed groups are the recommendation.
Migrating VM's to GCP:
- Importing virtual disks - use the import tool. It supports most formats VMDK, VHD, and RAW. This is going to use CloudBuild: this is used to help deploy infra. Also a pre-check tool for incompatibilities.
- Velostrata - Agentless cloud migration and the service is free to use for customers migrating into GCP from another cloud service.
- CloudEndure - agent based managed service which supports on-prem-to-cloud and cloud-to-cloud migrations.
Know the billing approaches and the tools:
- Bill per second, min of one min.
- Preemptable (24hrs) - kinda like spot.
- Discounts - Committed use - kinda like reserved. The more it's used, the more you save. They don't want them just sitting idle.
- Savings - up to 80% and no prepaid contract.
- Inferred instance - for billing purposes, the same type of machine used in the same zone will be combined into a single charge.
You will get recommendations for optimizing (rightsizing) VMs. These are generated automatically.

App Engine: PaaS - two different environments: Standard or Flexible. Mobile apps fit good here.

PaaS
Fully managed, just worry about your code
Supports code written in supported langs:
- Python
- Java
- Node.js
- Go
- Ruby
- PHP
- .Net
Standard or Flexible Environments.
SDK kits so that you can dev locally.
App Engine is Regional
- Google will manage it redundantly across all zones in that region.
- You cannot change a region after you set it.
Free and paid resources available.
Supports the Spring Framework and MemCache
Support and SLA
Supports:
- Traffic splitting
- Application versioning
- Integrates with StackDriver
Heavily tested and meant for you to deploy your web applications.
Instances are health-checked and healed as necessary, and co-located with other services within the project.
Critical, backwards compatible updates are automatically appled to the underlying operating system.
VM instances are automatically located by geographical region according to the settings in your project.
VM instances are restarted on a weekly basis and GCP will apply any necessary operating system and security updates.
Root access to ComputeEngine VM instances.
SSH to VM instances in the flex env is disabled by default.
Should I use Standard or Flexible:
- Standard: means that your app instances run in a sandbox, using the runtime env pf a supported lang.
- Flex: means that your app instances run within Docker containers on GCE VM's.
- It adds the ability to pick Ruby or .Net as well as pick any version of the langs that you choose.
Cloud Security Scanner: a cool feature that can help you find security problems in your app so that you can head off potential attacks.
Ways to store your data in AppEngine:
- Cloud DataStore for NoSQL / Schemaless.
- Cloud Storage for files and metadata
- Cloud SQL for relational db data such as MySQL/PGSQL
- Connect 3rd party: Redis, Mongo, Cassandra, Hadoop
Scaling:
- App.yaml controls type and instance class (resources)
- 3 Types (depends on the instance class):
- manual
- basic
- automatic
App Engine A/B Testing
- Traffic splitting
- specify eprcentage distribution across 2 or more versions within a service
- control the pace of the rollout
- splitting is applied to URLs that do not explicitly target a version
- NOTE: Required privs and check caching issues
3 different approaches to splitting traffic:
- IP address
- Cookie
- Random
Traffic migration depends on environment
- Standard: you can choose the route requests to the target version either immediately or gradually. Cannot expose these machines to the world.
- Flexible: Gradual migrations are not supported - can also attach ephemeral disks. Can expose these machines to the world.
- Note: warm-up requests improve user response time by allowing the version currently receiving traffic to handle those requests.
Traffic migration or traffic splitting???
- They will try and get you on this!
- Traffic migration switches the request routing between the versions within a service. Essentially moving traffic from one or more versions to a single new version.
- Traffic splitting is between two or more versions of your application for A/B testing. Traffic splitting is applied to URLs that do not explicitly target a version.
Each instance has an initial start up cost of 15 minutes
Note which case study uses this. Be aware if they locate other GCP services in other regions that this can depend on.
Contains Blobstore, but recommends Google Cloud Storage
Shared or dedicated Memcache
Can set up custom domains or register them here too. Upload certs as well.

Kubernetes Engine - container management. May be a good fit for CI/CD build pipeline with containers.

The amount of content here has increased.
Manage applications, not machines.
Why Use?
- Workload portability
- Run in many envs, across cloud providers
- Implementation is open and modular
- Rolling updates:
- Upgrade application with zero downtime
- Autoscaling
- Automatically adapt to changes in workload
Terms
- Pod - hosts containers.
- Volume - any data access mounted to a pod. Can persist. Are available to all containers in a pod.
- Container - if > 1 per pod, they are guaranteed to be scheduled together on the same VM
- Cluster - Containers package an application so it can be easily deployed to run in its own isolated environment. Containers are managed in clusters that automate VM creation and maintenance. A Kubernetes cluster is a managed group of VM instances for running containerized applications.
- Pools - instance groups in a kubernetes cluster
- All VM's in a pool are the same
- Pools can contain different VMs from one another
- Pools can be in different zones GKE is node pool aware
- Labels on VMs in the pool make them available to GKE
- Node pools and Multi-zone container clusters
  - Node pools are separate instance groups running Kubernetes in a cluster. You may add node pools in different zones for higher availability, or add node pools of different type machines.
- GKE will replicate all the pools along with all the clusters.
  - Be careful, this could use up quotas in the region.
- Workloads - what runs inside of the container cluster.
- Applications - Can deploy from the marketplace. Kubernetes Applications collect containers, services and configuration that are managed together.
- Services - Service instances are applications that your Kubernetes application can connect to and use. In order to work with service instances, you have to install service catalog on your cluster first.

	Kubernetes Engine	App Engine Std	App Engine Flex
Language	Any	Java, Python, Go, PHP. Node	Any
Service Model	Hybrid	PaaS	PaaS
Use Case	Containers	Web & Mobile	Web & Mobile container based

It seems like the differentiation they are driving at here is that AppEngine is app focused, and while they do the similar bits, Kube will give you more dials to tweak.
Kube Scaling:
- GKE cluster autoscalier automatically resizes clusters based on the demands of the workloads.
- Cluster autoscaling allows you to pay only for the resources that are needed.
- Cluster autoscaler supports up to 1000 nodes running 30 pods each
- Cluster autoscaler supports a graceful termination period for a pod of up to 10 minutes when scaling down.
Cluster achitecture
- Consists of at least one cluster master and multiple worker machines called nodes.
- These master and not machines run the Kube orchestration system.
- Resource allocation calc:
- Allocatable = Capacity - Reserved - Eviction Threshold
Deployments -Set of multiple identical pods with no unique identities
- Lifecycle: Inspect, Scale, Autoscale (via Hroizontal autoscaler), Delete
- States: progressing, completed, failed.
- Template: when changed, new pods are automatically created one at a time.
- Deployment manifest file .yaml
- Understand stateless vs. stateful usage patterns
- Stateless - R/O many or R/W many volume with multiple replicas?
Know that data access mounted to a pod is a volume and it's available to all containers in a the pod.
Know that it runs Docker containers
Know how it fits into the CI/CD pipeline and leverages CloudBuild, Container Registry, etc)
Secrets?
Identity management

Cloud Functions - serverless / microservices. Web hooks, small functional microservice style features.

Event-based microservices
Fully managed, severless, secure, FaaS.
Triggers - Cloud pub/sub, HTTP, CloudStorage
Code Deploy functions from a CloudStorage Bucket, GitHub, or BitBucket repo
Written in JS and run in Node.
StrackDriver integration
Cloud Functions come in two distinct variants: Foreground(HTTP) and Background.
Allows you to write simple, single purpose functions that are attached to events emitted from your cloud infra and services.
Funcs are triggered when an event being watched is started.
Unit of work is a "Function", or "Snippets" of code.
Examples:
- Obj uploaded to Cloud Storage
- Event is generated on create and the event data is transmitted to the function.
- Function is triggered by that event and invoked/run/exec'd
Other methods of invocation:
- HTTP - invoked directly via request
- Cloud Storage
- Cloud Pub/Sub
- StrackDriver Logging
- Cloud Firestore
- Compute Engine
- BigQuery
- Firebase(DB, Analytics, Auth)
Some Drawbacks to CloudFunctions:
- it is NOT a low latency service
Serverless, there are less resources that can be adjusted for the price/performance tradeoff.
Serverless, but not exactly performant.
Case study will ask what "Compute Service" to use.
- Key words - Microservices, Legacy to Cloud Application, and Serverless.

Networking:

The VPC is global.
A GLOBAL privae isolated virtual network partition that provides managed metworking functionality for your GCP resources.
A sandbox.
Regions, IP addresses, subnets...
A network can only belong to one project.
By Default: Limit of 5 networks and 100 subnets per project.
An instance can attach to only one network.
Max 7000 instances per network.
For instances to communicate over private IP, they must be in the same project and on the same network.
- When not on the same network, they must communicate over external IP.
Static or ephemeral external IP's are available.
GCE includes an internal FQDN DNS resolver.
You can associate firewall rules to tags on resources.... or by things that use a certain service account.
You have to define other protocols to allow, like icmp, by name.
Understand DNS zones and prefixes.
- Managed zones.
3 types of networks:
- Default
- Auto
- VPC Network created with one subnet from each region is automatically created within it.
- Uses predefined IP range
- automatically adds new regions with subnets
- Can add manually
- Custom
- Custom config
- VPC network is created, but no subnets are automatically created.
- Users your custom IP range
- You have control and add subnets as required.
Shared VPC - allows an organization to connect resources from multiple projects to a common VPC network via internal ips from that network.
Hybrid support
Private peering.
- allows private RFC1918 connectivity across two VPC networks regardless of whether or not they belong to the same project or the same organization.
- an example is wen one company buys out another and they are both on the platform.
- Orgs with several network admin domains or Orgs that want to peer with other Orgs.
Directly over the Google backbone. (differentiation)
Global resources: preconfigured disk images, disk snapshots, and networks.
Regional resources: static external IP addresses
Zonal resources: VM instances, their types, and disks.
Use networks to isolate systems.
Virtual Network Objects: (9)
- Projects
- Networks
- Subnetworks
- Regions
- Zones
- IP Addresses
- Virtual Machines
- Firewalls
- Routes

Internal IP	External IP
Allocated from subnet range VMs by DHCP	Assigned from pool (Ephemeral)
Renewed every 24hrs	Reserved (static) Billed when not attached to a running VM
VM name + IP is registered with network-scoped DNS	VM doesn't know external IP, it is mapped to the internal IP

know how to attach an external IP to a VM, need to provision from the internal to the external, can't just grant external to a VM.
Supported protocols:
- TCP
- UDP
- ICMP
Note: Supports IPv4 only
Every VM instance belongs to a network.
Default network is used if none are selected.
Subnets:
- Group related resources together
IP ranges
- Auto
- Custom
Routing
- Control flow of data and direct traffic to where you want it
- Default routes work in most cases, but if you need a custom route, you can create one.
Firewalls = (TAGS!)
- use user defined tags
- Used to group vms
- apply to vms
- not limited to a topology like an IP address
- can also bind to a service account
- Rules are a global resource
- control ingress and egress traffic with priority
- default allows ingress (allow only) matches IP CIDR Ranges, protocols, ports, and target.
- Tags - ICMP, SSH, RDP
- Support allows for ingress not DENIES.
Billing - for traffic egress
- to the internet (varies by region)
- from one region to another (in the same network)
- different rates for the same continent regions vs intercontinental
- between zones within a region
You are not billed for:
- traffic ingress
- VM to VM traffic in a single zone (same region, network)
- traffic to GCP services (limits apply, see docs)
Bastion hosts
- connect with external IP
- need to scale ssh (limit by ssh and CIDR)
- can connect with site to site vpn
- could also use a NAT gateway.
IP Address reminders:
- Each instance has a hostname that can be resolved to an internal IP address.
- Hostname is the same as the instance name.
- FQDN is: [hostname].c.[project-id].internal
- ex: test-machine.c.my-user-project-220928.internal
- Name resolution is handled by an internal DNS resolver.

Hybrid Connectivity

Cloud interconnect
- 10GPS
GCP has an interconnect (AWS DirectConnect) called Cloud Interconnect to extend your data center network into your Google CLoud projects.
IPsec VPN
- Can use your own solution or application VPN
Direct access to RFC1918 IPs in your VPC (there is an SLA)
- You will be connecting up to a Google endpoint
Partner Interconnect - more focused on partners - (SLA is only to the partner)
- 50Mbs min

Peering
Network latency is reduced
Network security (can keep services pvt to the internet and internal)
Network cost (may be lower cost if all internal, not always)
Peered VPCs are administered separately
- Routes, firewalls, VPNs and other traffic are handled individually in each VPC (may be duplication)
Each side of a peering associate is set up independently. Peering will be active only when both sides match
A given VPC network can peer with multiple VPC networks.
Coordination is key to peering.
Can be used to build SaaS ecosystems.
- Shared VPC
VPC Networking allows peering with a shared VPC
A shared VPC host project is a project that allows other projects to use one of it's networks.
Note: a single instance can have two network interfaces, each one being in a a separate VPC.
- Cloud VPN
Google Cloud VPN securly connects your on-premise network to your GCP VPC via a IPSec VPN connection.
Traffic travelling between the two networks is encrypted by one VPN gateway, then decrypted by the other.
Protects your data as it travels over the internet
Cloud VPN only supports IPSec gateway-to-gateway scenarios. You must have a dedicated physical or virtual IPSec VPN gateway on the client side.
High throughput, reliable, managed service.
IKE (internet key exchange) v1 & v2 supported
Can run over Cloud Interconnect
ECMP over multiple VPN tunnels to achieve greater overall throughput
Leverages google's edge locations across the globe to minimize latency.
Supports private addressing (RFC1918)
A separate instance of Cloud Router / VPN is required in each region
3GPS per tunnel, can increase with ECMP.
Cannot use Jumbo packets
Some firewall and UDP configs needed.
Ciphers may not be supported if old.
With static routing, updating the tunnel requires the addition of the static routes to GCP and restarting the VPN tunnel to include the new subnet.
Some points to know for the exam:
- Public IP on both peers
- Global or regional service
- 1.5GPS throughput
- Secret password
- Scale horizontally through ECMP parallel tunnels
- Use Dynamic routes to scale regionally or globally.
  - Will also need Cloud Router
Know how many regions you want to connect to
Simple setup, Auto mode, Custom mode
Auto mode with gateway subnet or auto mode with more than one subnet.

IAM

SSO
- Use your own auth system and manage creds
- Federate identities to GCP
- Users won't have to login a second time
- Can revoke access using existing mechanism
- Google Apps Directory Sync integrates with LDAP
- GCDS - GSuite admin can automatically add, modify or delete users, groups, etc. sync'd with an LDAP directory server or MS AD.
- The data in the LDAP / AD is never modified or compromised.
- GCDS is a secure tool that keeps track of users and groups.
- The GSuite admin can use GCDS Config Mgr to customize syncs, and can perform test syncs to find out what works best for the org.
- These can then be schedule when needed.
- Built on SAML2
- The only assertion that is used is the username
- Will need cert to validate signature
- Can use 3rd party plug in such as Ping or Okta.
- Roles:
- Primative:
  - The original roles available in the GCP.
  - Owner, Editor, Viewer.
  - These are broad.
- Curated:
  - New IAM roles that give fine-grained access control.
Service account - server to server account
- Will auth apps running on your VMs to other GCP Services.
- ex: App reads and writes to Cloud Storage, so it must auth to the CloudStorage API. You can enable service accounts to grant R/W access to the account on the instance where you plan to run your app.
- The program then obtains creds from the service account and the app can seamlessly use the API withou the need for keys or creds in the instance, image, or app code.
- By default all projects come with a service account
- When you start a new instance using gcloud, the default service account is enabled on that instance
- Apart from teh default service account, all projects come with a Google APIs service account, identifiable using the email {project-number}@cloudservices.gserviceaccount.com
- Default service accounts support primitive and curated IAM roles.
- Roles for servie accounts can be assigned to groups or users.
- Note: _One of the features of IAM service accounts is that you can treat them as resources or identities. Google managers keys and key rotation for Compute Engine and App Engine, you can Alternatively create and manage these yourself.
Identity-Aware Proxy - a way to secure all online identities and secure them with MFA.
- Identity-Aware Proxy (IAP) lets you manage who has access to services hosted on App Engine, Compute Engine, or an HTTPS Load Balancer.
- To get started with IAP, add an App Engine app, a Compute Engine instance or configure an HTTPS Load Balancer.

Resources

GCP Cloud Resource Manager
- IAM - flows (inherited) down!
- Resources inherit policies from parent
- Resource policies are a union of parent and resource
- If parent policy less restrictive, overrides more restrictive resource policy.
- Billing and resource monitoring - Flows up!
- Resource consumption is measured on:
  - Rate of use / time
  - Number of items
  - Feature use
- A resource belongs to one and only one project
- Project accumulates consumption of all resources
- Project is associated with one billing account
- Orgranization contains all billing accounts
- An Organization is created by a contract with Google Sales

2 organizational roles:
- Oganization Admin - control over all cloud resources
- Project Creator - Controls project creation.
3 types of resources:
- Global - accessible byany resource in any zone within the same project. When you create a global resource, you do not need to provide a scope specification.
- Images
- Snapshots
- VPC network BUT Subnets are regional
- Firewalls - apply to single VPC, but are considered global because packaets can reach them from other networks.
- Routes
- Global Operations
- Regional - in a specific region (in the Americas)
- Subnets are regional
- Addresses (static external)
- Regional operations
- Zone
- Instances
- Disks
- Machine types
- Per zone operations
Quota - used to protect you and other customers and google
- prevents runnaway resource consumption
- prevents billing spikes
- Enforces sizing consieration and periodic review.
- How to check:
- Go to quotas page on the console or
- gcloud compute project-info describe --project <project name>
- to check your used quota in a region, run:
  - gcloud compute regions describe <region>
Labels - a utility for organizing GCP resources
- Attached to resources: VM, disk, snapshot, image
- console or API, may not be in gcloud?
- a Key:Value pair that you can attach
Billing - linked to the project ID
- Can set up alerts
- Alerts will send a trigger
- Can export billing info
- Big Query (extra cost) or
- File Export (csv or json)
- Report is generated daily, not on demand
- Project Name and Labels are you post-export parsing tools.

StackDriver

StackDriver is a hybrid monitoring, logging, and diagnositic tool for applications on GCP and AWS
GCP Purchased Sstackdriver and was rebranded to Google StackDriver
StackDriver monitors the cloud's service layers in a single SaaS solution
Native integration with GCP data tools BigQuery, Cloud Pub/Sub, Cloud Storage, Cloud Datalab, and OOTB intregatoins with your other app components.
Access from GCP Console.
It's tied to projects
Free for 30 days and then downgraded to the basic version.
Monitors multi-cloud
Identify trends and prevents issues.
Lowers monitoring headaches
Fix problems faster
Reduces monitoring noise.
Major features:
- Monitoring
- Debugging
- Logging
- Trace
- Error Reporting
You'll need to create a StackDriver account for the project
You'll create a StackDriver account that monitors multiple projects
Defaults are intelligent and dynamic - it will automatically scan and populate once bound to our GCP accounts.
Health checks
Metrics = Platform, system, application
- Ingest Data Metrics, events and metadata
- Then provides insight through dasboards, charts and alerts.
Manual monitoring agent install
- For AWS EC2 and GCP VMS
- App engine has built in support
- curl -O https://repo.stackdriver.com/stack-install.sh
- sudo bash ./stack-install.sh --write-gcm
- The agent is based on the original collectd system statistics collection daemon. stackdriver-agent
- There is no container engine support
- Actually: Google Kubernetes Engine provides an option to install two versions of Stackdriver support on clusters and nodes. The option is presented when creating or updating clusters using either the GKE console or using the gcloud containers command.
- Only specific OS's and versiosn are supported so validate here
uptime checks verify 6 GCP global locations
when you make a change to an uptime check the delay could be 25 mins
Logging:
- Supports platform, system, and app logs
- 7 day for basic, 30 day premium retention (cloud storage for longer)
- Search, view and filter
- Log based metrics
- Alerts on log events
- Basic and Premium versions (pay additional)
- Manual install of Logging Agent:
- curl -sSO https://dl.google.com/cloudagents/install-logging-agent.sh
- sudo bash ./install-logging-agent.sh
- Again, you don't have to do this on AppEngine
Don't use substrings
Set up filters
Advanced viewing interface
Export logs to cloud storage
BigQuery (Search and analyze)
DataLab (Visualize)
Pub/sub (App or Endpoint Streams)
Aggregate and Display errors for running cloud services
- Error notifications
- Error dashboard
- Java, Python, JS, Ruby, C#, PhP, and Go
Trace:
- What part of stackdriver would you need to consider if you had an application that needed to ba analyzed?
- Gather and analyze TRACE flows
- Bottleneck discovery
- Analyzes apps and generates reports
- App Engine projects captured
- TRACE SDK: Java, Node, Ruby, Go
- Displays data in near time
- Latency reports
- Latency sampling (URL)
- Data is collected:
- App Engine
- HTTP Load Balancers and Stackdriver Trace SDKs.
- Debugging:
- Inspect apps and not have to stop it
- App Engine Standard or Flexible
- Java, Python, Go
- Snapshots
- Logpoints

Storage Options

Cloud Storage
- 4 specific options
- Multi-Regional - all locs
- Regional - one loc
- Nearline - once a month
- Coldline - once a year
- Features
- Object control
- Object versioning
- Object life cycle management
  - Config changes can take 24hrs to apply
  - Can create rulesets on age, size, location.
  - Object inspection is done in async batches
- Object change notification
- Imports (from aws or region to region via migration service)
  - Offline media import service similar to Snowball.
  - Third party providers.
- Usually used as the ingestion point for all the other GCP services.
- To accomplish AWS EFS on Google, you have to use the FUSE adapter.
- Ingress is free, egress is charged.
- Data xfer within a region is free
- Petabytes of data
- Read - have to copy to local disk
- Write - one file
- Upgrade granularity - 1 object
- Usage - storage blob
- Limit 5TB per obj
- Security
- Encryption of data (at rest)
  - Uses key mgmt
  - Encrypts at the application layer
  - HW encryption support for HDs and SSDs.
  - They track each driver through it's lifecycle
- Deletion of data
  - "Scheduled for deletion"
  - Deleted in accourdance with service policies
- Titan security chips
- There is a lot more to this, but it's so similar to S3 and *nix I didn't bother with a ton of notes.
Cloud SQL (RDBMS)
- MySQL -5.6/5.7 (second gen, more perf)
- PostgreSQL - 9.6
- Regional - Set region and zone (not multi) - Vertical Scal
- Not all regions / zones are supported.
- Cloud SQL instances are fully managed, relational MySQL and PostgreSQL databases. Google handles replication, patch management and database management to ensure availability and performance
- pay per use model
- Rest API
- Affordable & high perf
- Adaptive, Vertical scaling R/W
- Horizontal scaling R
- Seamless integration with AppEngine and ComputeEngine
- Supports IPv4 & IPv6
- Automated backup and recovery
- Availability protection
- Partner Ecosystems
- Fully managed
- Google scale security
- Automated backups occur on a daily basis during chosen windows
- On-demand backups are also available via the console or API
- SQL supported features:
- Stored procs, Triggers, Views
- Doesn't support: user-defined functions, internal MySQL replication, statements and funcs related to files and plugins
- CMS, eCommerce, Web frameworks
Cloud Spanner (RDBMS+)
- Need to enable CloudSpanner API in the project
- Global - Horizontal Scale
- GBs
- Strongly consistent
- SQL Support
- Managed Service
- Secure Global Transactions
- Managed by Google's SRE team
- Horizontal scaling.
- Strict ACID compliance.
- Can contain one or more tables.
- Data is strongly typed (Strong Schema)
- Supports SQL syntax, but is at a different level of compatibility
- Not ANSI SQL compatible
- Some code written for spanner is not portable
- Highly consistent internal clock on all nodes.
- Supports a larger feature set (such as time bound queries, that allow one to perform faster reads)
- Interaction through the cloud spanner API
- Keep an eye out for cross regional requirements.
- Ad Tech, Financial transactions
- Cloud Spanner is a fully managed, mission-critical relational database service designed for transactional consistency at a global scale. It offers traditional relational semantics (schemas, ACID transactions, SQL) and automatic, synchronous replication for high availability.
Exam
- Main focus should be "does the customer require global scale with high transactions per second? Or does the customer just want a traditional RDBMS that does now scale horizontally"
- Does the customer require an Open Source solution.
- Migration - Cloud SQL is open source, where Cloud Spanner is a "Lift and Shift" of existing source code.
Cloud DataStore
- Document Database - structured data - persistent hashmap
- TBs
- Highly Scalable
- NoSQL
- Strong consistency
- Automatically handles Sharding and replication
- Highly available and durable
- Scales automatically
- Web or Mobile apps
- Games or User profiles
- Schemaless DB
- Pay per use, Rest API
- Entity reads and writes and operations
- storage use
- Read - filter objects
- Write - put object
- Attributes - Entity(row) - Kind (table)
- Tightly coupled with AppEngine
- Replication:
- Multiple locations
- Multi regional(more reliable) or regional(lower write latency)
- Global points of presence(pops) - lower latency for end user.
Cloud BigTable - HBASE
- Cloud Bigtable is a fully managed NoSQL database that supports the popular open-source Apache HBase 1.0 API. You can provision Cloud Bigtable instances for your workload, then use the Bigtable HBase client to develop applications using the standard open-source Big Data tools you're familiar with.
- Petabytes
- Low latency
- Fully managed
- Seamless scalability for throughput
- Leans and adjusts to access patters (AI like)
- Low latency storage stack
- Redundant autoscaling
- Used for low latency big data
- Good for heacy read and write events
- Key Value
- Rows (scans and puts)
- No-ops, high throughput, scalable, flattened data
- utility to locally emulate BigTable for dev
- Data API
- Streaming / sequential
- Batch Processing
- Colossus file system - tablets
- Processing is separate from storage.
- Integrates with Hadoop, Google cloud dataflow, and Dataproc
- Cloud DataFlow: Cloud Dataflow provides scalable data-processing pipelines for small and large jobs. Use the Cloud Dataflow SDK to define jobs, and then monitor them on the console.
- BQ command line tool

DevOps

Tools available
- CloudSource repos - Free
- Cloud Source Repositories helps you privately host, track, and manage changes to large codebases on Google Cloud Platform.
- Deploy and debug in minutes - Built-in integrations with other GCP tools lets you automatically build, test, deploy, and debug code within minutes.
- Fast code search - Use regular expressions to search across multiple projects, files, and repositories to quickly review and debug code.
- Fully-managed Git - Access fully-featured, private Git repositories on Google Cloud. Bring in existing code from GitHub or Bitbucket repositories.
- Unlimited private repositories - Create an unlimited number of private Git repositories to host and maintain your code.
- Stackdriver Debug, Trace, Log
- CloudEndpoints - (AWS API Gateway)
- Google Cloud Endpoints let you manage and control access to your own APIs. You can keep APIs private or share them with partners, and you can monitor API usage
- CLI only
- NGINX based prozy
- Used to create a web backend
- Used for web and mobile clients
- Deploy with AppEngine
- Tools and Libs
- Allows Access to:
  - AppEngine instances
  - DataStore
  - Cloud Storage
  - Task Queues
- Reduces dev cycle
- Java and Python
- CloudBuild - container builder / deployer - Can do this with multiple Env's
- Run your container image builds in a fast, consistent, and reliable environment on Google Cloud Platform. Build in any language and package your build artifacts into Docker containers for deployment. Use Google Cloud SDK to integrate with your favorite developer tools and any continuous delivery system.
- Unified CI/CD pipelines Kubernetes Engine / Container Registry
- Google Container Registry provides secure, private Docker repository storage on Google Cloud Platform. You can use gcloud to push images to your registry , then you can pull images using an HTTP endpoint from any machine, whether it's a Google Compute Engine instance or your own hardware
- You can create a project for each env (dev, prod, test, qa)
- Make sure they have the right perms
- Blue green deployment models
- app engine can split traffic
- App engine security scanner can look for issues as well.
Cloud Pub/Sub
- A fully managed real time messaging service
- decouples sender and receiver
- Asynch
- Scales globally
- Low latency
- Dynamic rate limits
- Durable / replicated / reliable / secure
- Publisher -> Topic (message store) -> Subscription (pull/push) -> Subscriber (ack)
- Message has a payload and attributes
- Use Cases:
- balance workloads
- Implement Async workflows
- Dritibute event notifications (fan out)
- Refreshing distributed cache
- Logging to multiple systems
- Data streaming from various processes or devices
- Reliability improvement

Storage Migration

gsutil or Cloud Storage Transfer Service?
- When transferring data from on prem use gsutil
- Use gsutil if you have a lot of data restructuring or renaming.
- When transferring data from another cloud provider use Storage Transfer Service (STS) - it really was optimized to xfer from S3.
- (if on prem and have bandwidth issues, may want Import/Export)
Cloud Storage Transfer Service
- Source to Sink (target)
- Can back up data from another storage provider
- Can move data from Multi-Regional Storage Bucket to a Nearline Storage Bucket to lower your costs. Or even from one region to another.
- Can Schedule one-time xfers or recurring
- Delete existing objs in the destination, if none correspond in source
- Delete source objs once xferred
- Schedule periodic syncs with advanced filters (file dates, names)
- Default is to copy from source to sink if it doesn't exist in sink or differs. Default is also to retain files in the source.
Cloud Storage FUSE (CSFUSE)
- Allows you to mount cloud storage buckets as a file system on Linux or OSX systems for normal file cp, rm, mv, operations.
- Can be used on Compute Engines or on prem systems.
- Not officially supported by Google, but built by them and released via Apache.
Offline Media Import / Export
- Probably the last used resort
- Used for large amounts of data, but tight on bandwidth
- Done by third parties
- Iron Mountain (NA)
- Prime Focus (EMEA/APAC)
- Zadara (NA/EMEA)

Load Balancing & Autoscaling

Load Balancing

GCP Load Balancing is a Managed Service
Connection draining - delays termination of an instance until remaining connections are closed
- New connections are prevented
- Instance preservices existing sessions until they end OR a disignated timeout is reached
- Minimizes interruption for users.
- Triggered when and instance is removed from an instance group
- Manual removal, resizing, autoscaling.
Types
- Network
- External & Internal
- distributes incomming traffic across multiple instances
- supports non-HTTP(S) protocols (TCP/IP)
- Can be used for HTTPS when you want to terminate connection on your instances and NOT at the load balancer (pass thru)
- Supports autoscaling with managed instance groups.
- If the scenario is a mobile app, this probably won't be the right choice.
- Regional service, not global
- Will preserve the client IP address (pass thru)
- supports various ports
- forwarding rules consist of Name, Region
- Regional IP address
- IP (TCP, UDP, ICMP)
- Can target a Pool or an Instance.
  - Target pools consist of
  - Name
  - Description
  - Region
  - Instancs (must all be in the same region)
  - Session Affinity (NONE, CLIENT_IP_PROT, CLIENT_IP)
  - BackupPool
  - FailoverRatio
- HTTP(S)
- ports 80, 8080, 443
- Distributes HTTP(S) traffic among instance groups basedon proximity to the user, the URL, or both
- Autoscalers can be attached to the HTTP(S) load balancers.
- There is a global forwarding rule
- Target Proxy (w/ SSL Cert resrouce for HTTPS proxy)
- URL map - route traffic based on specific URLs
- Backend service and backends
- Health Check
- GLobal IP Address (ephemeral or static)
- One or more instance groups
- Global forwarding (a rule) provides a single global IP address for an application
  - This rule routes traffic by IP address, port, and protocol to an HTTP(S) target proxy.
  - A global forwarding rule can only forward to a single port
  - can only be used by an HTTP(S) load balancer
- Target Proxies
  - route incomming HTTP(S) requests based on URL maps and backend service configurations
  - HTTPS target proxies terminate client SSL sessions
  - HTTPS target proxies require configured SSL certs
- Cross-Region
- HTTP(S) only
- Cross-Region using a single IP
- Requests routed to the closest region
- Automatically reroutes to the closest one
- Eliminates need for DNS-Based load balancing.
- Content-based
- HTTP(S) only
- Create multiple backend services to handle content types
- Add path rules to backend services
- /video for video services
- /static for static content
- Configure different instance types for different content types.
- Cloud SSL Proxy
- Non-HTTP(S) traffic
- External
- Performs global load balancing, routing clients to the closest instance with capacity
- Intelligent routing
- Reduced CPI load on instances
- Certificate management
- Security patching
For the exam:
- HTTP(S), TCP, and SSL load balancing
- Network load balancing
- https://cloud.google.com/load-balancing/docs/choosing-load-balancer
- Google cloud SSL proxy terminates the user SSL(TLS) connection at the global load balancing layer, then balances the connectinos across your instances via SSL or TCP
- Cloud SSL proxy is intended for NON HTTP(S) traffic.
- For HTTP(S) traffic HTTP(S) load balancing is used.

Load balancer	Traffic type	Global/Regional	External/Internal	External Ports for Load Balancing
HTTP(S)	HTTP or HTTPS	Global	External	HTTP on 80 or 8080; HTTPS on 443
SSL Proxy	TCP with SSL offload	Global	External	25, 43, 110, 143, 195, 443, 465, 587, 700, 993, 995, 1883, and 5222
TCP Proxy	TCP without SSL offload. Does not preserve client IP addresses	Global	External	25, 43, 110, 143, 195, 443, 465, 587, 700, 993, 995, 1883, 5222
Network TCP/UDP	TCP/UDP without SSL offload. Preserves client IP addresses.	Regional	External	Any
Internal TCP/UDP	TCP or UDP	Regional	Internal	Any

Autoscaling

Part of the Compute Engine API
Used to automatically scale the numer of instances in a managed instance group based on workload
Create one autoscaler per managed instance group
Autoscalers can be used with zone-based managed instance groups or regional managed instance groups
Fast, typically ~1min windows.
For the Exam:
- When an autoscaler scales down:
- it determines the number of VM's it needs to shut down.
- Before an instance is terminated, remaining connections are validated, and apps/etc are gracefully shut down
- leverages shutdown scripts.

Case Studies

Nearly 50% of the exam!
Will likely see 2 of the 3 on the exam, I got hit with all 3

Mountkirk Games

What can you do to isolate (development env's) from (staging & production)?
- 2 Projects - (1. dev/test 2. stg/prod)
How should test coverage differ from their existing backends on the other platforms?
- Test ON GCP.
Complete testing processs for new versions of the backend before released and the testing env should scale in an economical way. How to design this?
- Use the existing infra to test the GCP-based backend at scale
CD pipeline, arch includes small services, update and rollback quickly. Redundant across multiple regions. Only front end exposed. Single IP. Immutable artifacts. What products should they use?
- Google Container Registry, GKE, Google HTTP(s) Load Balancer

Dress4Win

Corporate emails to remain available for infrequent viewing by auditors for at least 10 years. Cost is top priority. Which services should you choose?
- Google cloud storage coldline to store the data, and GSUtil to access.
Uptime check for stackdriver is not reporting the services as healthy. What should they do?
- Configure their legacy web servers to allow requests that contain user-agent HTTP header when the value matches GoogleStackDriverMonitoring
Disabled external SSH access into prod VMs. Ops team needs to remote manage GCS objects.Best option?
- Configure a VPN connection to GCP to allow SSH access to the cloud.

TerramEarth

Plans to connect all 20mil vehicles in the field to the cloud. This will increase requirements. How do you design data ingestion?
- Vehicles write data directly to Pub/Sub
Which legacy services will experience changes due to increased GCP adoption?
- Capacity planning, TCO calcs, Opex/Capex allocation.

Tips

Compute engine / groups / templates
IAM - quotas
App Engine
Storage
Pub/Sub
DataProc
Deployment Manager - create deployments or prepackaged solution
Monitoring / Stackdriver
Networking - Loadbalancing - DNS
Quickstart guides
Go through Case Studies and map possible cloud services.

Blog