AWS Cloud9 for Data Engineers
Gaurav Thalpati
Posted on March 17, 2023
This article was originally posted on my substack. Sharing it here with fellow community builders.
I usually do multiple quick PoCs for my day-to-day analysis and RnD work. I often have to install various software, applications, databases, and tools for these. I’ve been using dockers by installing docker desktop on my windows laptop. I have an 8GB RAM laptop which is not the best for this kind of work. That’s why I’ve shifted to AWS Cloud9. It’s an AWS service that can help you to perform your PoC work quickly.
Here is a quick guide on using Cloud9 for Data Engineering PoCs.
What is Cloud9?
AWS Cloud9 is a cloud-based IDE for development work. It is powered by EC2 machine, and its size can be selected based on the workload you want to execute.
It provides IDE to write, execute and debug code and supports Python, JavaScript, and many other languages. The best thing is that it integrates with AWS services like S3, and you can easily download and upload files from/to S3 from Cloud9. It also supports collaborative development and a chat facility with other developers.
AWS Cloud9 is not a “data” specific service and is not much discussed within the data community. But it is one of the best services that can help you to make DE work much easier and quicker.
Why you should use Cloud9 for DE PoCs?
- Single interface to perform various activities like creating code, running bash commands, transferring files to S3, running AWS CLI commands, and pushing code to git.
- Easy to install new tools using dockers.
- Provision EC2 instance as per your need. No need to worry about powerful laptops with 16GB+ RAM. ( I generally use m5.xlarge with 16GB RAM)
- Start and Stop without losing your installed software. Pay only when you are using it.
- All good features of EC2 + simplicity of doing all things in one place
Below is a list of some of the DE activities that Cloud9 can be used for
Use Case #1 | Editing S3 files quickly
Scenario: You want to create and upload some dummy data to S3.
You can easily create a new file in Cloud9 and upload it in just a couple of clicks to your S3 bucket.
If you want to add more columns to this file or add more records, you can download the file, make changes and upload it back - without leaving your Cloud9 terminal.
Supports multiple AWS Services along with S3
You can also execute simple shell commands to make changes to files. If you love running sed or awk one-liners, you can definitely try it out!
Use Case #2 | Running AWS CLI commands
You can execute the AWS CLI commands directly from the Cloud9 console without adding any credentials.
AWS CLI is preinstalled on the Amazon Linux 2 machine.
Scenario: You want to check the IAM users in your account.
You can execute the AWS CLI commands for the IAM service.
Use Case #3 | Creating Python Scripts
If you want to create quick Python scripts for your DE work, you don’t need to open PyCharm or other editors. You can simply do it in Cloud9 itself.
Change the “Text” to “Python” to switch to Python Compiler
Save the file with .py extension and execute it in the console
Use Case #4 | Running dockers
Scenario: You want to run Spark quickly and try out some simple commands for learning purposes. There are many options to use - Glue, EMR, Databricks. One of the easiest ways is to run Spark on docker using Cloud9
docker is pre-installed if you have selected the Amazon Linux 2 machines while creating the Cloud9 instance
To confirm if docker is installed, execute the below command.
Now you can pull Spark (Python) from the docker hub and start the shell using the commands below.
docker pull apache/spark-py
docker run -it apache/spark-py /opt/spark/bin/pyspark
You can follow the same approach for running other tools like Kafka, MySql, and many others.
This Cloud9 instance does not come with docker-compose, which might be required for other software.
For installing docker-compose, you can execute the below commands
sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
docker-compose version //Validate if its working
Note: _For Amazon Linux2, you need to install docker-compose-linux-x86_64_
Use Case #5 | Uploading code to git
And finally, when all your work is done, and you want to save your work for future reference, it can be easily uploaded to git. Cloud9 has easy integration with git, and you can quickly pull and push your code to git repos.
Scenario: You want to push the python code you created earlier to your git repo
Configure git from the left-hand pane using the “Source Control” option. For the first time, clone the repo by providing the repo link. It will identify the changes and mark them accordingly. You can also use manual commands like add, commit, and push.
Configure the git repo in the source control
Note: You will have to provide your git user name and personal token when pushing the new changes to your git repo
Validate the changes in your git repo.
Note: Once you finish your work, close the Cloud9 window; otherwise, the instance will keep running. You can also go to EC2 services on the console and directly stop the Cloud9 instance to save some $
These are just a few use cases of Cloud9 for DE work. You can explore and leverage other features for your day-to-day RnD work, PoCs, learning, and training activities.
And Cloud9 is not just for doing PoCs or educational work. You can also use it in your actual projects. It can help in collaborative coding, chatting with fellow developers, and many more cool features.
You can try these out, and if you have any comments/suggestions/questions, please let me know.
Posted on March 17, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.