How to Create a Powerful Text Ranker Using NodeJS, Redis and Docker

Living in a world where most people struggle to leave the house without their smartphone has meant Google is never too far away from us. In an instant, we can find an answer to almost any question through a search query on Google’s search engine.

Search engines (and their access to infinite information) have been integrated into everyday life, making many of us incredibly reliant on them. And for one to be optimal, answers to queries need to be retrieved instantly due to the high standards set by Google and other tech giants.

Any lags, delays, or drags will hamper the user’s experience, which is why this Launchpad App, Alexis, has used Redis as the main database to overcome this obstacle.

The founder of this application, Bobby Donchev, has leveraged the power of RedisAI and RediSearch to retrieve information from a corpus in response to a query with maximum efficiency. Users are able to index PDFs and use a simple UI to extract information from their documents.

Without Redis, the entire search process would be sluggish, hampering the functionality of Alexis. Let’s take a look at how Bobby put this application together.

However, before moving on, we’d like to highlight that we also have an awesome range of applications for you to check out on the Redis Launchpad. So make sure to check it out!

What will you build?
What will you need?
Architecture
Getting started
How the data is stored
How the data is accessed
How it works

1. What will you build?

You’ll build an efficient text ranker capable of retrieving search queries at maximum speed. Users will be able to leverage this application to index important PDFs and extract answers from their documents with ease.

We’ll go through each step in chronological order and highlight what components are required to build the application.

2. What will you need?

RediSearch: indexes, querying and full-text search engine
RedisAI: executes deep learning/machine learning models to manage data and decrease latency
Redis Streams: manages data consumption
NodeJS: used as an open-source, cross-platform that executes - JavaScript code outside a web browser
RedisJSON: implements ECMA-404 The JSON Data Interchange Standard as a native data type.

3. Architecture

Providing an answer to the searcher’s query happens in two steps:

Firstly, you select the text that’s likely to contain the answers. You’ll have to use RediSearch with the BM25 ranking function for this step.
You can use a Transformer AI model loaded into RedisAI to identify the answer spans in the text.

By using RediSearch in the first step, you’ll drastically reduce the search space. This will make the app’s overall experience faster. After this, you’ll need to use NodeJS with typescript in the backend and React with typescript in the frontend.

Besides using RedisAI and RediSearch, you’ll be leveraging RedisJSON for your user model as well as an asynchronous worker implemented with Redis Streams.

The webserver is exposed with the express framework that has the following endpoints:



POST /v1/users
POST /v1/login
POST /v1/logout
GET /v1/me

POST /pdf  (pdfUpload)
POST /v1/query

Once you register and log into the app, you’ll be able to start adding documents to the indexed library. When a PDF is uploaded, an event will be written into Redis Streams. Afterwards, somebody from the consumer group will pick up the event for async processing.

You can then process the PDF, apply some cleaning and store the PDF in a Redis hash that’s indexed with RediSearch. You’ll now be able to send natural queries to the server and won’t be confined to basic keyword searches such as ‘kubernetes deployments,’ ‘DDD root aggregate’ etc.

Instead, you’ll be able to query more relevant searches.

Flowchart

Below is a general overview of how Alexis functions.

Now let’s break down how the Upload PDFs & Index PDF Content and the Answer Query parts of the flowchart operate.

Answer Query

The user enters a query on the UI which is then sent to RediSearch
Both RediSearch and the BM25 function are then activated using keywords to find the most meaningful content.
This content is then transmitted to RedisAI, along with the query, for it to compare and decide which answer is the most relevant to the user’s query.

Upload PDFs & Index PDF Content

A user types in a question into the search engine.
RediSearch indexes the PDF(s) and searches for an answer to this query.
RedisAI runs an inference and pulls a number of possible answers.
RedisAI then compares each answer and decides which one is most relevant to the query.
The answer is finally displayed to the user.

4. Getting started

Step 1: Install the prerequisites

Node - v12.x.x
NPM - v6.x.x
Docker and Docker-compose

Step 2. Clone the repository



 git clone https://github.com/redis-developer/alexis

Step 3: Install the dependencies

Change the directory to alexis and run the below command:



 npm install

Step 4. Setting up frontend and backend

The below command will bootstrap server and client app and also it will initialize Redis server as well as RedisInsight GUI:



  npm run bootstrap

Step 5. Start the application



 npm start

Step 6. Accessing the application

Open http://localhost:3000 to access the application

Step 7. Accessing RedisInsight

RedisInsight is a visual tool that lets you do both GUI- and CLI-based interactions with your Redis database, and so much more when developing your Redis based application. It is a fully-featured pure Desktop GUI client that provides capabilities to design, develop and optimize your Redis application. Click Here to learn more about RedisInsight

The RedisInsight GUI is accessible via the following link: http://localhost:8001

5. How the data is stored

Step 1: The user data is stored in a RedisJSON



 {
  firstName: string
  lastName: string
  email: string
  password: string
  pdfs: Array<{id: string, fileName: string}>
 }

Step 2: A RediSearch index is created for each user with the code below



 FT.CREATE ax:idx:<userId> on HASH PREFIX 1 ax:pdfs:<userId> 
 SCHEMA content TEXT PHONETIC dm:en

Read the complete blog

Blog