Building a Jupyter Notebook Environment in Docker for Data Analysis on AWS EC2
Zahraa Jawad
Posted on September 30, 2024
Outline
- What is Jupyter Notebook
- Docker in the AWS environment with the Jupyter Notebook
- Install Jupyter Notebook using Docker in an AWS environment
What is Jupyter Notebook
JupyterLab: A Next-Generation Notebook Interface JupyterLab is the latest web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design invites extensions to expand and enrich functionality.
JupyterLab is the latest web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design invites extensions to expand and enrich functionality.
Docker in the AWS environment with the Jupyter Notebook
In this work, we will take practical and distinctive advantage of the uses of Docker installed in the AWS environment by building and preparing an environment for data analysis using Jupyter Notebook inside a Docker container and running it on AWS EC2, as it provides many important benefits, especially in the fields of data analysis and data science. These benefits include:
1. Portability and Replication
Docker containers ensure that your work environment is consistent across different systems. You can easily move the container between different machines without worrying about system compatibility.
2. Ease of Setup and Operation
With AWS EC2, you can quickly set up a new instance and launch a Docker container. This reduces the time and effort required to set up a data analysis environment, allowing you to focus on the actual work instead of setting up the infrastructure.
3. Easily Scalable
AWS EC2 provides the flexibility to scale resources as needed. You can increase or decrease the size of the instance based on your analysis requirements, saving operational costs and ensuring optimal performance.
4. Remote Access and Collaboration
Jupyter Notebook provides an interactive web interface that can be accessed from anywhere. This facilitates collaboration between teams, as multiple users can access the same environment and work on the same projects in real-time.
5. Integration with Big Data Tools
You can integrate Jupyter Notebook with big data tools like Apache Spark and Hadoop, making it easier to analyze and visualize big data.
- Data Security With AWS, you can use advanced security features like Identity and Access Management (IAM), Virtual Private Networks (VPC), and encryption, ensuring your sensitive data is protected.
Install Jupyter Notebook using Docker in an AWS environment
To install Jupyter Notebook using Docker in an AWS environment, follow these steps:
Step 1 "Launch Instance"
When logging into the AWS account, we select the EC2 service through Services or by the search box:
Click on Launch instance
Under Name and tags:
Enter a name to identify your instance, For this tutorial, name the instance (Jupyter Notebook)
Under Application and OS Images:
From Quick Start, choose an AMI that meets your web server needs
Here we choose Ubuntu (which is free tier eligible)
Under Instance type:
Choose the type of instance, here we choose t2.micro(which is a free tier eligible).
Under Key pair (login):
Choose the key pair
or create new key pair:
Give a name to the key pair, then click Create key pair:
Under Network settings: under Firewall (security groups)
Choose to Create security groups
To Allow SSH traffic by clicking on the check box
Leave all other configurations as they are (default settings)
In the Summary panel, review your instance configuration and then choose Launch instance.
Successfully initiated launch of instance and to see the instance click on the ID:
Your instance will first be Pending, and will then go into the Running state.
Step 2: "Connect to the instance"
To connect to your instance, select the instance and choose Connect.
There are many ways to connect to ec2, here we will choose the SSH client to connect.
After selecting the "SSH Client" section, copy and execute the following commands in the terminal as per the following steps:
Open Terminal (here we use Git Bash)
Change the directory with the cd command (change directory), where you have downloaded your pem file(key pair).
In this article, the pem file is stored in the downloads folder.
Execute the cd
command to change the path to the location of the encryption key
cd Download/
Execute the following commands sequentially
1-Chmod 400 [key pair name].pem
2-ssh -i /path/key-pair-name.pem instance-user-name@instance-public-dns-name
After the command is executed you will be prompted to type “Yes” to continue with the connection
And that’s it! Now we’re logged in to our AWS instance.
Now We get root permission by executing the sudo -i
command
Executing the command "sudo -i" means booting as root on Linux. The main feature of this command is that it gives you full admin (root user) privileges, allowing you to perform commands and operations that require root user privileges.
We update the repositories through the command:
sudo apt update && sudo apt upgrade -y
Docker installation:
I used Docker installed on an instance in the AWS account, and this was explained in the article:
• Create a Dockerfile
Create a Dockerfile describing how to build the image. This file contains the instructions necessary to install and configure the application within the image.
we can do it by the command(nano):
nano Dockerfile
Write the instructions necessary to install and configure the application:
Then follow the steps to store and exit the file:
Ctrl+x : to exit
Y: to save then enter.
Now to build the Jupyter Notebook image we execute the following command:
docker build -t my-jupyter-notebook .
Image built successfully
To make sure, we execute the command:
docker images
run the container using the following command:
docker run -d -p 8888:8888 my-jupyter-notebook
Access to Jupyter Notebook
To access Jupyter Notebook, we must open its port, and this is done through the following steps:
- Go back to the instance and select it by clicking on the checkbox, then go to the security box
- Open the security group by clicking on it
- Choose the Inbound rules then edit Inbound rules
- Click Add Rule
- Enter the rule Note: In practice, it is not preferable to leave the state 0.0.0.0/0 This is easy to hack, but we are here to learn the labs and the building process.
Now, Go back to the instance and select it, then go to the Details and copy the Public IPv4 address
Paste the public IPv4 address with port 8888 into the browser and press Enter
Jupyter Notebook has been successfully built and you can work on it
References:
Posted on September 30, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
September 30, 2024