How to Create a Powerful Text Ranker Using NodeJS, Redis and Docker
Ajeet Singh Raina
Posted on December 22, 2021
Living in a world where most people struggle to leave the house without their smartphone has meant Google is never too far away from us. In an instant, we can find an answer to almost any question through a search query on Google’s search engine.
Search engines (and their access to infinite information) have been integrated into everyday life, making many of us incredibly reliant on them. And for one to be optimal, answers to queries need to be retrieved instantly due to the high standards set by Google and other tech giants.
Any lags, delays, or drags will hamper the user’s experience, which is why this Launchpad App, Alexis, has used Redis as the main database to overcome this obstacle.
The founder of this application, Bobby Donchev, has leveraged the power of RedisAI and RediSearch to retrieve information from a corpus in response to a query with maximum efficiency. Users are able to index PDFs and use a simple UI to extract information from their documents.
Without Redis, the entire search process would be sluggish, hampering the functionality of Alexis. Let’s take a look at how Bobby put this application together.
However, before moving on, we’d like to highlight that we also have an awesome range of applications for you to check out on the Redis Launchpad. So make sure to check it out!
- What will you build?
- What will you need?
- Architecture
- Getting started
- How the data is stored
- How the data is accessed
- How it works
1. What will you build?
You’ll build an efficient text ranker capable of retrieving search queries at maximum speed. Users will be able to leverage this application to index important PDFs and extract answers from their documents with ease.
We’ll go through each step in chronological order and highlight what components are required to build the application.
2. What will you need?
- RediSearch: indexes, querying and full-text search engine
- RedisAI: executes deep learning/machine learning models to manage data and decrease latency
- Redis Streams: manages data consumption
- NodeJS: used as an open-source, cross-platform that executes - JavaScript code outside a web browser
- RedisJSON: implements ECMA-404 The JSON Data Interchange Standard as a native data type.
3. Architecture
Providing an answer to the searcher’s query happens in two steps:
- Firstly, you select the text that’s likely to contain the answers. You’ll have to use RediSearch with the BM25 ranking function for this step.
- You can use a Transformer AI model loaded into RedisAI to identify the answer spans in the text.
By using RediSearch in the first step, you’ll drastically reduce the search space. This will make the app’s overall experience faster. After this, you’ll need to use NodeJS with typescript in the backend and React with typescript in the frontend.
Besides using RedisAI and RediSearch, you’ll be leveraging RedisJSON for your user model as well as an asynchronous worker implemented with Redis Streams.
The webserver is exposed with the express framework that has the following endpoints:
POST /v1/users
POST /v1/login
POST /v1/logout
GET /v1/me
POST /pdf (pdfUpload)
POST /v1/query
Once you register and log into the app, you’ll be able to start adding documents to the indexed library. When a PDF is uploaded, an event will be written into Redis Streams. Afterwards, somebody from the consumer group will pick up the event for async processing.
You can then process the PDF, apply some cleaning and store the PDF in a Redis hash that’s indexed with RediSearch. You’ll now be able to send natural queries to the server and won’t be confined to basic keyword searches such as ‘kubernetes deployments,’ ‘DDD root aggregate’ etc.
Instead, you’ll be able to query more relevant searches.
Flowchart
Below is a general overview of how Alexis functions.
Now let’s break down how the Upload PDFs & Index PDF Content and the Answer Query parts of the flowchart operate.
Answer Query
- The user enters a query on the UI which is then sent to RediSearch
- Both RediSearch and the BM25 function are then activated using keywords to find the most meaningful content.
- This content is then transmitted to RedisAI, along with the query, for it to compare and decide which answer is the most relevant to the user’s query.
Upload PDFs & Index PDF Content
- A user types in a question into the search engine.
- RediSearch indexes the PDF(s) and searches for an answer to this query.
- RedisAI runs an inference and pulls a number of possible answers.
- RedisAI then compares each answer and decides which one is most relevant to the query.
- The answer is finally displayed to the user.
4. Getting started
Step 1: Install the prerequisites
- Node - v12.x.x
- NPM - v6.x.x
- Docker and Docker-compose
Step 2. Clone the repository
git clone https://github.com/redis-developer/alexis
Step 3: Install the dependencies
Change the directory to alexis and run the below command:
npm install
Step 4. Setting up frontend and backend
The below command will bootstrap server and client app and also it will initialize Redis server as well as RedisInsight GUI:
npm run bootstrap
Step 5. Start the application
npm start
Step 6. Accessing the application
Open http://localhost:3000 to access the application
Step 7. Accessing RedisInsight
RedisInsight is a visual tool that lets you do both GUI- and CLI-based interactions with your Redis database, and so much more when developing your Redis based application. It is a fully-featured pure Desktop GUI client that provides capabilities to design, develop and optimize your Redis application. Click Here to learn more about RedisInsight
The RedisInsight GUI is accessible via the following link: http://localhost:8001
5. How the data is stored
Step 1: The user data is stored in a RedisJSON
{
firstName: string
lastName: string
email: string
password: string
pdfs: Array<{id: string, fileName: string}>
}
Step 2: A RediSearch index is created for each user with the code below
FT.CREATE ax:idx:<userId> on HASH PREFIX 1 ax:pdfs:<userId>
SCHEMA content TEXT PHONETIC dm:en
Posted on December 22, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.