Building a Perplexity-like Open Source AI Search with SWIRL

srbhr

Saurabh Rai

Posted on October 24, 2024

Building a Perplexity-like Open Source AI Search with SWIRL

Perplexity AI is an AI-powered search engine that has gained traction for its ability to deliver answers to search queries by combining search engines and AI models.

Unlike traditional search engines like bing, google, yahoo. Perplexity uses large language models to interpret and respond to queries, providing results that are not just keyword-based but contextually relevant.

Building Perplexity

Searching for answers

There are three important factors that we can interpret form the above paragraphs.

  1. Understanding user queries as they’re natural language like, owning to LLMs and some prompt engineering.
  2. The searching capability to search and fetch answers from different sources.
  3. Bringing all the results together and generating an AI answer from it while citing the sources.
    • (Optional) Re-ranking the results so results are more relevant to what user has asked for.

So, in order to have a working perplexity for your own data and data sources, we’ll have to have a solution that connectors to datasources, apps and databases to fetch data. Needs to have a search infrastructure in place and then an integration with an AI model for summaries, re-ranking etc.

For this part we’ll be using SWIRL for the integration with apps and AI models. And for the search part, we’ll be using the default Google PSE which comes built in with SWIRL’s docker container. You can add more apps to search from, but that’s a tutorial for some other time. 😁

SWIRL on GitHub.

Setting up SWIRL

To try SWIRL in Docker using the GitHub guide, follow these steps:
There’s also a short youtube tutorial that I made which you can watch.

Prerequisites

  1. Ensure that you have Docker installed and running on your system (MacOS, Linux, or Windows).
  2. Windows users may need to install and configure either WSL 2 or Hyper-V.

Steps to Set Up SWIRL in Docker 🐋

  1. Download the Docker YAML File:

    • Open a terminal and run:
     curl https://raw.githubusercontent.com/swirlai/swirl-search/main/docker-compose.yaml -o docker-compose.yaml
    
  2. Start SWIRL with Docker:

    • For MacOS or Linux, run:
     docker-compose pull && docker-compose up
    
  • For Windows, run from PowerShell:

     docker compose up
    
  1. Enable Real-Time RAG (Retrieval-Augmented Generation):

    • Set environment variables:
     export MSAL_CB_PORT=8000
     export MSAL_HOST=localhost
     export OPENAI_API_KEY='your-OpenAI-API-key'
    
  • Restart Docker to activate RAG features.
  1. Access SWIRL:
    • Open your browser and go to http://localhost:8000.
    • Log in using the admin credentials (admin as username and password as the default password).
    • Enter a search query and click 'Search' to see results.

If you need more help refer to the full guide here.

Start Searching and Generate AI Summaries 🔍

Once we have SWIRL up and running we can get started with searching for different queries and generating AI summaries on top of it. The best part is that SWIRL provides re-ranking of search results.

Searching for “Best open source search engines”

Searching for top open source search engines

Searching for “Attention is all you need”

Searching for Attention is all you need

AI Summary of “Attention is all you need”

AI Summary Attention is all you need

So, far it works pretty well with Google PSE, but you can also search specific websites as well. As seen in the left pane. Also, adding a connector is pretty easy and just a PR away.

Adding more apps and personalizing your Search Experience

Using Google search results for search and RAG is a great start, but the real advantage comes when SWIRL is connected to various apps. This enables more comprehensive searches and AI-generated summaries directly from your workplace data, making information discovery faster and more effective.

To get started, simply visit the admin panel and enter the bearer token for the apps you want to connect. There’s a detailed video guide available that walks you through integrating with OpenSearch, along with documentation on supported connectors. This makes it easy to configure SWIRL and maximize its search capabilities across your connected data sources.

Join our SWIRL Community

Join our SWIRL Community

SWIRL is open-source and licensed under Apache 2.0. We’d love for you to check it out on GitHub and give it a Star—it’s a great way to support us and keep us motivated to roll out new features!

Give us a 🌟 on GitHub.

If you’re interested in adding a new website or app as a searchable connector, we welcome your contributions.

Join our Slack.

💖 💪 🙅 🚩
srbhr
Saurabh Rai

Posted on October 24, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related