Erin Fox
Posted on September 20, 2021
You might be thinking, WFT is ETL and have I been using it all this time?!
If you're an engineer, you probably have done some form of ETL. I never realized I was extracting, transforming and loading data throughout my career until researching it recently. I also, need to get better at it and the best way I know how is by researching, practice and writing about it.
I'm still working on learning it with more complicated data structures and data transformations, but I wanted to break it down to the beginning to make sure I understand it each step of the way. But with cats added.
What is ETL?
ETL = Extract, Transform, Load
ETL is a series of steps to move data from one location to another. When doing this, it transforms the data structure before it is loaded from its source to its new destination. In more words, it is a process you can use to help plan and execute the movement of data that you need.
Why use ETL?
I'm sure there are several answers to this question. For me, using it breaks down the steps of gathering and retrieving data. It also forces you to understand the shape of the data, what data you need, and how eventually you want it to look before rendering it in your app, browser or database.
A more fancier definition for why we use it: Data from different sources can be pulled together and restructured to a standardized format.
Let's walk through each step of extracting, transforming and loading data with React and Rails.
Extract - PLAN IT
Extract, is all about planning for the transforming. There are 3 steps or questions to ask yourself to find the answers needed in order to move on to the next step, transform.
- Where is the data that I need?
- Get it locally from your routes file?
- From another endpoint or 3rd party API like
the Spotify API?
- For our example, we will use hardcoded code found in our controller.
def cats
render json: {
cats: [ # Cat.all
{
name: "Maya",
color: "calico",
rating: "perfect",
owners: [
"Mark"
]
},
{
name: "Sully",
color: "seal bicolor",
rating: "perfect",
owners: [
"Erin"
]
}
]
}
end
-
What specific information do I need from that data?"
- Decide what data to extract
- In our example, let's extract the colors of the cats. So we want to return only the colors.
- Decide what data to extract
-
What should I use to retrieve that data?
- Query/retrieve the data
- A fetch request with JS on the frontend?
- A
Net::HTTP
with Ruby?- For our example, we will use
request
which is a custom little thing we built internally. It is build off JS fetch.
- For our example, we will use
This handleGet
function is the main way we will extract and receive the data we need.
async function handleGet() {
const response = await request("/some-cool-route/cats") // built off js fetch, so this is a GET request
console.log("handleGet", response)
}
And our console log, would look like this:
Transform - DO IT
Now that we have learned where the data is (in a method within the controller), what part of it we need (the cat's colors) and how to retrieve the data (a fetch GET request using an internal tool we use) we can now start changing the data to the shape we want. We can restructure it, rename it, remove things we don't need and even add values.
- What should the data structure look like?
- Since our example is small, we are looking
only to return the cat's colors. We don't
need the name, rating or owners.
- We would want our transformed data to look like this if we were to console log it.
- Since our example is small, we are looking
only to return the cat's colors. We don't
need the name, rating or owners.
We can transform the cats array, to return only the cat colors by creating a function that takes the data (cat's array) and returns a new data structure (an array of cat colors).
With our data in a method in our controller, let's look at our react component that will render the page.
This is were we can create a transform function transformColors()
that will return an array of each cat's color: ["calico", "seal bicolor"]
function transformColors(cats) {
return cats.map(cat => cat.color)
}
Load / Render - SHOW IT
- Where should the data then be loaded or rendered?
- Add it to the database or display it to the user
- In some situations, you may be adding this new array of cat colors to your database.
- I mostly work with rendering the data to the page with React components, so let's see it all play out that way.
Here is the react component rendering our new transformed data.
import React, { useState } from "react"
import { request } from "react-rb" // internal tool by CK
export default function Meow() {
const [cats, setCats] = useState([])
const [colors, setColors] = useState([])
async function handleGet() {
// EXTRACT
const response = await request("/some-cool-route/cats") // built off js fetch, so this is a GET request
setCats(response.data.cats)
const transformCatColors = transformColors(response.data.cats)
setColors(transformCatColors)
}
return (
<div className="m-8">
<button onClick={handleGet}>Get cat colors 🐈 🐈⬛</button>
// LOAD/RENDER
<div>{colors}</div>
</div>
)
}
// TRANSFORM
function transformColors(cats) {
return cats.map(cat => <div key={cat.color}>{cat.color}</div>)
}
Let's recap. We have Extracted the data from our controller using a fetch request. We then Transformed that data to return only the cat colors with our transformColors()
function. And finally, we can Load/Render it to the page in our React component in the JSX.
Here's a gif it it all working! It is not pretty, but hopefully you get the idea.
Hopefully this small example helps explain ETL just a little but more!
A big shoutout to my coworker Mark M. for helping me grasp this concept even further and for setting up this awesome cat example.
Posted on September 20, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.