Snowflake Introduction
Aryama
Posted on January 11, 2024
What is Snowflake?
It is an analytics database or Data warehouse as a service
It is a SaaS tool to load , analyze and report on massive data volumes
It provides data storage and data analytics solution
Data warehouse solution sitting on a cloud only
Snowflake cloud providers - Mircosoft Azure , Amazon Web services , Google cloud platform
Pay per second billing model. You only pay for what you store and running compute. When compute is not used, it is not charged.
Alternatives to Snowflake - Google BigQuery , Amazon Redshift , Databricks
Snowflake architecture
Architecture diagram:
It has a multi clustered shared architechture
Snowflake’s architecture is a hybrid of traditional shared-disk and shared-nothing database architectures. Similar to shared-disk architectures, Snowflake uses a central data repository for persisted data that is accessible from all compute nodes in the platform. But similar to shared-nothing architectures, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire data set locally. This approach offers the data management simplicity of a shared-disk architecture, but with the performance and scale-out benefits of a shared-nothing architecture.
It decouples both compute and storage.
- It has 3 layers -> Storage layer , Compute layer ,Cloud Service layer
Storage layer -
At the centre , we have storage - stores table , views , data is stored in both structured and well as semi structured.
They are Compressed & Encrypted(AES 256) and then stored.
Snowflake converts them into optimised columnar compressed format (proprietary to snowflake)
Compute layer - Virtual warehouses are connected directly to the GCP compute instance/Amazon EC2 instances. VW are the place where the queries are executed. VWs can be scaled up or down on demand
Comes in Various Sizes
• X-Small - Single Node(DDL)
• Small - Two Nodes
• Medium - Four Nodes(Data load)
Large - Eight Nodes
• X Large - Sixteen(Data processing)
Storage and compute charged independently and only for usage
Eg: If you store TBs of data and no processing, you will be charged only for storage and not for processing.
Cloud Service layer
- Authentication & Authorisation
- User & Session Management
- Query Compilation, Optimisation & Data caching
- Virtual Warehouse Management , Coordinate Data Storage/Updates & Transaction
- Metadata Management - Zero Copy Cloning , Time Travel, Data Sharing
- Manage and Maintain the life cycle of a query
Features
Unlimited Storage & Compute - Advantages of infinite scalability, elasticity, & redundancy features and hence you can store more and scale up/down your compute as needed.
Supported by all the major cloud providers
Data Platform as Service - There is virtually no software to install, configure or manage.Ongoing maintenance, management, upgrades, and tuning are handled by Snowflake
Time Travel Feature & Fail Safe -
As part of continuous data protection lifecycle, snowflake allows you to access historical data (table, schema or database) at any point with in the defined period.Clone or Zero Copy Clone - Clone or Zero Copy Clone creates a copy of database, schema or table without actually copying the data. It is a snapshot of the data to the source object.The clone object is writable object and independent of source object.Cloning is just an SQL statement and since it does not need any additional space, many data copy challenges can be easily solved.Cloning feature is also used to build environments like Prod to QA or QA to Dev or visa-versa without any extra storage cost
Support for semi structured data
On demand pricing
Micro partitions
Posted on January 11, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 30, 2024