Hello Devs,
The team at Swirl has created this amazing guide which contains all the relevant information for anyone who wants to extend Swirl by adding SearchProviders, Connectors, and Processors.
This makes it easy for you to contribute to Swirl. Get started with open source with Swirl. And we're participating in Hacktoberfest, giving out Swags to the contributors. Swags are up to $100, please check the blog here for more information.
Learn more about Swirl by checking out this article below.
A SearchProvider is a configuration of a Connector. So, to connect to a given source, first, verify that it supports a Connector you already have. (See the next tutorial for information on creating new Connectors.)
For example, if trying to query a website using a URL like https://host.com/?q=my+query+here that returns JSON or XML, create a new SearchProvider configuring the RequestsGet connector as follows:
To learn more about query and URL parameters, refer to the Developer Guide.
If the website offers the ability to page through results, or sort results by date (as well as relevancy), use the PAGE= and DATE_SORT query mappings to add support for these features through Swirl.
Open the query URL in a browser and look through the JSON response.
If using Visual Studio Code, right-click on the pasted JSON and select Format Document to make it easier to read.
Identify the results list and the number of results found and retrieved. Put these JSON paths in the response_mappings. Then, identify the JSON paths to use to extract the Swirl default fields title, body, url, date_published and author from each item in the result lists in the result_mappings, with the Swirl field name on the left, and the source JSON path on the right.
{"name":"My New SearchProvider","connector":"RequestsGet","url":"https://host.com/","query_template":"{url}?q={query_string}","query_processors":["AdaptiveQueryProcessor"],"query_mappings":"","result_processors":["MappingResultProcessor","CosineRelevancyResultProcessor"],"response_mappings":"FOUND=jsonpath.to.number.found,RETRIEVED=jsonpath.to.number.retrieved,RESULTS=jsonpath.to.result.list","result_mappings":"url=link,body=snippet,author=displayLink,NO_PAYLOAD","credentials":"bearer=your-bearer-token-here","tags":["MyTag"]}
Go to Swirl localhost:8000/swirl/searchproviders/, logging in if necessary. Put the form at the bottom of the page into RAW mode, and paste the SearchProvider in. Then hit POST. The SearchProvider will reload.
Go to Galaxy localhost:8000/galaxy/ and run a search using the tag you created earlier. Results should again appear in roughly the same period of time.
Creating a Connector
In Swirl, Connectors are responsible for loading a SearchProvider, then constructing and transmitting queries to a particular type of service, then saving the response - typically a result list.
:info: Consider using your favorite coding AI to generate a Connector by passing it the Connector base classes, and information about the API you are trying to query.
:info: If you are trying to send an HTTP/S request to an endpoint that returns JSON or XML, you don't need to create a Connector. Instead, Create a SearchProvider that configures the RequestsGet connector included with Swirl.
To create a new Connector:
Create a new file, e.g. swirl/connectors/my_connector.py
Copy the style of the ChatGPT connector as a starting point, or BigQuery it targeting a database.
In the init class, load and persist anything that will be needed when connecting and querying the service. Use the ChatGPT Connector as a guide.
Import the python package(s) to connect to the service. The ChatGPT connector uses the openai package, for example:
importopenai
Modify the execute_search method to connect to the service.
As you can see from the ChatGPT Connector, it first loads the OpenAI credentials, then constructs a prompt, sends the prompt via openai.ChatCompletion.create(), then stores the response.
defexecute_search(self,session=None):logger.debug(f"{self}: execute_search()")ifself.provider.credentials:openai.api_key=self.provider.credentialselse:ifgetattr(settings,'OPENAI_API_KEY',None):openai.api_key=settings.OPENAI_API_KEYelse:self.status="ERR_NO_CREDENTIALS"returnprompted_query=""ifself.query_to_provider.endswith('?'):prompted_query=self.query_to_providerelse:if'PROMPT'inself.query_mappings:prompted_query=self.query_mappings['PROMPT'].format(query_to_provider=self.query_to_provider)else:prompted_query=self.query_to_providerself.warning(f'PROMPT not found in query_mappings!')if'CHAT_QUERY_REWRITE_GUIDE'inself.query_mappings:self.system_guide=self.query_mappings['CHAT_QUERY_REWRITE_GUIDE'].format(query_to_provider=self.query_to_provider)ifnotprompted_query:self.found=0self.retrieved=0self.response=[]self.status="ERR_PROMPT_FAILED"returnlogger.info(f'CGPT completion system guide:{self.system_guide} query to provider : {self.query_to_provider}')self.query_to_provider=prompted_querycompletions=openai.ChatCompletion.create(model=MODEL,messages=[{"role":"system","content":self.system_guide},{"role":"user","content":self.query_to_provider},],temperature=0,)message=completions['choices'][0]['message']['content']# FROM API Doc
self.found=1self.retrieved=1self.response=message.replace("\n\n","")return
ChatGPT depends on the OpenAI API key, which is provided to Swirl via the .env file. To follow this pattern, create new values in .env then modify swirl_server/settings.py to load them as Django settings, and set a reasonable default.
Modify the normalize_response() method to store the raw response. This is literally no more (or less) than writing the result objects out as a Python list and storing that in self.results:
Create a SearchProvider to configure the new Connector, then add it to the Swirl installation as noted in the Create a SearchProvider tutorial.
Don't forget a useful tag so you can easily target the new connector when ready to test.
To learn more about developing Connectors, refer to the Developer Guide.
Creating a QueryProcessor
A QueryProcessor is a stage executed either during Pre-Query or Query Processing. The difference between these is that Pre-Query processing is applied to all SearchProviders, and Query Processing is executed by each individual SearchProviders. In both cases, the goal is to modify the query sent to some or all SearchProviders.
Create a new file, e.g. swirl/processors/my_query_processor.py
Copy the GenericQueryProcessor class as a starting point, and rename it:
classMyQueryProcessor(QueryProcessor):type='MyQueryProcessor'defprocess(self):# TO DO: modify self.query_string, and return it
returnself.query_string+' modified'
Save the module.
Add the new module to swirl/processors/__init__.py
Go to Galaxy http://localhost:8000/swirl/search/?q=some+query
Run a search; if using a query processor be sure to target that SearchProvider. For example if you added a QueryProcessor to a SearchProvider query_processing pipeline with tag "news", the query would be http://localhost:8000/swirl/search/?q=news:some+query instead.
Results should appear in a just a few seconds. In the messages block a message indicating that the new QueryProcessor rewrote the query should appear:
MyQueryProcessor rewrote Strategy Consulting - Google PSE's query to: <modified-query>
To learn more about writing Processors, refer to the Developer Guide.
Creating a ResultProcessor
A ResultProcessor is a stage executed by each SearchProvider, after the Connector has retrieved results. ResultProcessors operate on results and transform them as needed for downstream consumption or presentation.
The GenericResultProcessor and MappingResultProcessor stages are intended to normalize JSON results. GenericResultProcessor searches for exact matches to the Swirl schema (as noted in the SearchProvider example) and copies them over. MappingResultProcessor applies result_mappings to normalize the results, again as shown in the SearchProvider example above. In general adding stages after these is a good idea, unless the SearchProvider is expected to respond in a Swirl schema format.
To create a new ResultProcessor:
Create a new file, e.g. swirl/processors/my_result_processor.py
Copy the GenericResultProcessor class as a starting point, and rename it. Don't forget the init.
Implement the process() method. This is the only one required.
Process() operates on self.results, which will contain all the results from a given SearchProvider, in python list format. Modify items in the result list, and report the number updated.
defprocess(self):ifnotself.results:returnupdated=0foriteminself.results:# TO DO: operate on each item and count number updated
item['my_field1']='test'updated=updated+1# note: there is no need to save in this type of Processor
# save modified self.results
self.processed_results=self.results# save number of updated
self.modified=updatedreturnself.modified
Save the module.
Add the new module to swirl/processors/__init__.py
Go to Galaxy http://localhost:8000/swirl/search/?q=some+query
Run a search; be sure to target at least one SearchProvider that has the new ResultProcessor.
For example if you added a ResultProcessor to a SearchProvider result_processing pipeline with tag "news", the query would need to be http://localhost:8000/swirl/search/?q=news:some+query instead of the above.
Results should appear in a just a few seconds. In the messages block a message indicating that the new ResultProcessor updated a number of results should appear, and the content should be modified as expected.
To learn more about writing Processors, refer to the Developer Guide.
Creating a PostResultProcessor
A PostResultProcessor is a stage executed after all SearchProviders have returned results. They operate on all the results for a given query.
To create a new ResultProcessor:
Create a new file, e.g. swirl/processors/my_post_result_processor.py
Copy the template below as a starting point, and rename it:
classMyPostResultProcessor(PostResultProcessor):type='MyPostResultProcessor'############################################
def__init__(self,search_id,request_id=''):returnsuper().__init__(search_id,request_id=request_id)############################################
defprocess(self):updated=0forresultsinself.results:ifnotresults.json_results:continueforiteminresults.json_results:# TO DO: operate on each result item
item['my_field2']="test"updated=updated+1# end for
# call results.save() if any results were modified
ifupdated>0:results.save()# end for
############################################
self.results_updated=updatedreturnself.results_updated
Modify the process() method, operating on the items and saving each result set as shown.
Add the new module to swirl/processors/__init__.py
Go to Galaxy http://localhost:8000/swirl/search/?q=some+query
Run a search; be sure to target at least one SearchProvider that has the new PostResultProcessor.
For example if you added a PostResultProcessor to a Search post_result_processing pipeline with tag "news", the query would need to be http://localhost:8000/swirl/search/?q=news:some+query instead of the above.
Results should appear in a just a few seconds. In the messages block a message indicating that the new PostResultProcessor updated a number of results should appear, and the content should be modified as expected.
SWIRL AI Connect: AI infrastructure software that powers your Search & Retrieval Augmented Generation (RAG) applications. Simplify and enhance your AI pipelines with seamless integration of large language models (LLMs) and data sources.
SWIRL AI Connect
Bring AI to the Data, not the Data to the AI
SWIRL AI Connect is advanced AI infrastructure software. It supports enhanced Retrieval Augmented Generation (RAG) capabilities, powerful analytics, and SWIRL Co-Pilot. SWIRL harnesses AI for business, enabling organizations to make better decisions and take more effective and timely actions.
Get your AI up and running in minutes, not months. SWIRL AI Connect is an open-source AI Connect platform that streamlines the integration of advanced AI technologies into business operations. It supports powerful features like Retrieval-Augmented Generation (RAG), Analytics, and Co-Pilot, enabling enhanced decision-making with AI and boosting enterprise AI transformation.
SWIRL operated without needing to move data into a vector database or undergo ETL processes. This approach not only enhances security but also speeds up the deployment. As a private cloud AI providerβ¦