AzureFunBytes Episode 43 - Intro to @Azure Data Factory with @KromerBigData
Jay Gordon
Posted on May 13, 2021
AzureFunBytes is a weekly opportunity to learn more about the fundamentals and foundations that make up Azure. It's a chance for me to understand more about what people across the Azure organization do and how they do it. Every week we get together at 11 AM Pacific on Microsoft LearnTV and learn more about Azure.
Data drives so many of our decisions. Whether it's determining which products to have viewed first in our online retail store, or creating reports for business intelligence, we've got so much data! It's time to figure out how to learn how to take that data and provide human-readable information that will help us continue to make the right decisions.
This week on AzureFunBytes, I am joined by Principal Program Manager, Mark Kromer about how to store and process our big data with Azure Data Factory. Mark will discuss the ETL (Extract, Transform, Load) process that gets our data into Azure Data Factory. I ask Mark how can we transfer the data we might have to Azure? We look into how to create pipelines to automate the ingestion of our data from various data stores.
00:04:35 - Intro to Mark
00:09:45 - Let's meet Data Factory
00:14:48 - CI/CD With Data Factory Pipelines
00:20:32 - Azure Data Factory connector overview
00:31:57 - Demo Time
Our Agenda:
- Intro to Data Factory
- Differences between ADF & Synapse
- Data Flows in ADF & Synapse
- Data lake ETL patterns
- Build an ETL flow using taxi sample data (Demo)
- Q&A
From the Azure Documentation "What is Azure Data Factory?"
Azure Data Factory is the platform that solves such data scenarios. It is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.
Integrate all your data with Azure Data Factory—a fully managed, serverless data integration service. Visually integrate data sources with more than 90 built-in, maintenance-free connectors at no added cost. Easily construct ETL and ELT processes code-free in an intuitive environment or write your own code. Then deliver integrated data to Azure Synapse Analytics to unlock business insights.
Learn about Azure fundamentals with me!
Live stream is available on Twitch, YouTube, and LearnTV at 11 AM PT / 2 PM ET Thursday. You can also find the recordings here as well:
AzureFunBytes on Twitch
AzureFunBytes on YouTube
Azure DevOps YouTube Channel
Follow AzureFunBytes on Twitter
Useful Docs:
Get $200 in free Azure Credit
Microsoft Learn: Introduction to Azure fundamentals
Microsoft Learn: Integrate data with Azure Data Factory or Azure Synapse Pipeline
Microsoft Learn: Data integration at scale with Azure Data Factory or Azure Synapse Pipeline
Azure Data Factory
Azure Data Factory documentation
Azure Data Factory Tutorials
Extract, transform, and load (ETL)
Transferring data to and from Azure
Big data architecture style
Watch our snack-sized video tutorials here to learn more about building ETL with data flows
Follow the Delta Lake tutorial here to build your own lake
Branching and chaining activities in an Azure Data Factory pipeline using the Azure portal
For access to the taxi medallion sample data to build these pipelines on your own, visit Mark's sample data repo here and look for trip data and trip fare
Posted on May 13, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.