How I built a real-time Machine Learning system with Kafka, Elasticsearch, Kibana, and Docker

dipankarmedhi

Dipankar Medhi

Posted on December 28, 2022

How I built a real-time Machine Learning system with Kafka, Elasticsearch, Kibana, and Docker

We will design and build a real-time sentiment analysis and hate detection system.

This is a project that I made in the Turn Language into Action, Natural Language Hackathon by Expert.ai .

I have always been interested in real-time systems and have always wondered how things work under the hood.

HOW? 🤔

So, I found this hackathon to be a perfect opportunity for me to learn and build something new.

Well then, Lets ROLL!!!

Project Architecture

This is what the complete pipeline looks like. Dont worry I will cover everything in detail.

Project Architecture

But before we move on with the tools and architecture, let me talk about our data sources.

I have used Twitter API for real-time tweets, specifically pythons tweepy library for streaming tweets. In addition to that, I have used NewsAPI for daily news articles.

I have used docker to set up all the necessary tools as containers for this project.

Now lets talk about each component.

Apache Kafka

For ingesting the real-time data, I have used Apache Kafka.

Now, what is Apache Kafka? Well

Apache Kafka (Kafka) is an open source, distributed streaming platform that enables (among other things) the development of real-time, event-driven applications. IBM

Since I have used Python, there is a python client kafka-python available that makes working with Kafka relatively easy.

Using the KafkaProducer , Ive sent the messages (Twitter and NewsAPI) via 2 Kafka topics to the KafkaConsumer. One for the tweets and the other one for the news articles respectively.

KafkaConsumer then calls the Machine Learning service to classify the sentiments of the news media articles and detect hate in the tweets.

Machine Learning service

Expert.ai turns language into data so teams can make better decisions.

Since I built this project as a part of the Expert.ai hackathon, I have used their API for sentiment analysis/classification and hate detection.

However, you can always use your own Tensorflow or PyTorch model. Also, Huggingface has some very relevant models for sentiment classifications and they are straightforward to set up. You should check them out!

I am using the Sentiment Analysis and Hate speech detection APIs from Expert.ai NL API.

Elasticsearch

Okay, we have the classified data. Now What?

We have to store that data somewhere to use it for further analytics. I have used Elasticsearch and Kibana to visualize the stored data.

You might ask, why Kibana?

Let me introduce you to the ELK stack.

ELK is the acronym for three open source projects: Elasticsearch, Logstash, and Kibana. Elasticsearch is a search and analytics engine. Logstash is a serverside data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a stash like Elasticsearch. Kibana lets users visualize data with charts and graphs in Elasticsearch. Elastic.co

Elasticsearch, Logstash and Kibana go hand in hand in most data engineering or data ingestion use cases. But I have omitted Logstash to keep the pipeline simple and clear to its goal.

But, you can always add Logstash and scale the pipeline further as needed.

That is enough about the ELK stack. Lets jump into the Elasticsearch design.

Elasticsearch: The Official Distributed Search & Analytics Engine

Like databases, Elasticsearch has " Indexes". These indexes store data defined with certain mappings type. Mapping is more like a schema in other databases.

The mapping describes the fields in the JSON documents along with their data type, as well as how they should be indexed in the indexes.

Databases ~ Indexes

The above image will give you a better idea about Elasticsearch indexes compared to MySQL or PostgreSQL.

Kibana

Done with storing the messages/data in the Elasticsearch indexes? Okay, Great! We can finally use that resultant data to visualize and get more insights about the data.

We use Kibana for that.

Kibana: Explore, Visualize, Discover Data | Elastic

Your window into the Elastic Stack Kibana is a free and open user interface that lets you visualize your Elasticsearch

www.elastic.co

Kibana is a free and open user interface that lets you visualize your Elasticsearch data and navigate the Elastic Stack.

Kibana Dashboard

This is what my final Kibana dashboard looks like. You can check out the code at my GitHub repo.

Feel free to leave a star if you like the project.

This part covers only the idea or the overview of the project along with the project architecture. Ill soon add the coding section in a separate part so stay tuned for that


Thats all folks. See you soon 👋

Happy coding.

💖 💪 🙅 🚩
dipankarmedhi
Dipankar Medhi

Posted on December 28, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related