Data Engineering Projects for Beginners
Ramses Alexander Coraspe
Posted on June 15, 2022
Hi everyone,
I am a little bit obsessed with data engineering and lately I have been working on several open source projects about this topic, here is a list of repositories and technologies used in each one, if you decide to go deeper into this funny world then these repositories could help you as a guide.
❤ means "I like this one"
❤ Tracking your Uber Rides and Uber Eats expenses through a data engineering process
Technologies and skills:
Python, Docker, Apache Airflow, AWS Redshift, Power BI, data modelling, Task schedulling, ETL and ELT processes, Data warehousing, Cloud
❤ Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
Technologies and skills:
Python, Docker, Big Data, Cloud, BigQuery, Workflow Engines, GCP, Task scheduler, Google Cloud Platform, Dataproc cluster, GCS, Google Cloud Storage, Redis, DAG, Parallel Processing, Apache Spark
❤ Building Big Data Pipelines in the Cloud with AWS EMR
Technologies and skills:
Python, PySpark, AWS EMR, Task Schedulling, IAC, EC2 Instances, Apache Spark, Cloud
❤ Building a Lossless Data Compression and Data Decompression Pipeline
Technologies and skills:
Python, Data compression, BZIP2, Parallel programming
Learn how to dockerize an Apache Spark Standalone Cluster
Technologies and skills:
Python, Jupyter Notebook, Apache Spark, Docker, docker-compose, Hive
❤ Dockerizing and Consuming an Apache Livy environment
Technologies and skills:
Python, Big Data, Docker, docker-compose, Apache Livy, Apache Spark, PostgreSQL, PySpark, Jupyter Notebook
❤ Design, Development and Deployment of a simple Data Pipeline
Technologies and skills:
Python, data Modelling, Docker, docker-compose, PostgreSQL, data pipeline, FastApi
Dockerizing a Python Script for Faster Web Scraping
Technologies and skills:
Python, Docker, Sqlite, Dockerfile, Web scraping, Data pipeline, FastApi
Understanding Similarity Measures for Text Analysis
Technologies and skills:
Python, Machine Learning, Similarity measures, Distance metrics, Text Analysis
❤ Learn how to build a content-based Movie Recommender System
Technologies and skills:
Python, Machine Learning, TF-IDF, Cosine similarity, BM25, BERT, NLP, word2vec, Text Analysis, recsys
A Text Analysis of Speeches
Technologies and skills:
Python, Machine Learning, NLP, word2vec, Text Analysis, Sentiment Analysis, PCA, t-SNE, Word Embeddings, Text Preprocessing, Web scraping, Data Visualization, Mexico
❤ Dropout Students Prediction
Technologies and skills:
R, Genetic algorithm, Neural Networks, K-Means, Clustering, Machine Learning
I will be working on more complex projects in the next months using modern tech data stacks.
Posted on June 15, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
September 13, 2024