StarSense - new way of interacting with repos

This is a submission for the Open Source AI Challenge with pgai and Ollama

What I Built

I built StarSense, an intelligent chat interface that helps developers easily search and discover their starred GitHub repositories using natural language. The project leverages RAG (Retrieval-Augmented Generation) technology to create a seamless conversation experience with your GitHub stars.

StarSense automatically processes your starred repositories by:

Authenticating with GitHub via OAuth
Fetching all starred repositories
Extracting and processing README content
Storing repository information in PostgreSQL
Generating embeddings using pgai vectorizer
Enabling natural language queries using vector similarity search

Demo/Repo

Repo: https://github.com/XamHans/starsense
Video: https://youtu.be/Uf1uzI0e3jM

The application features a clean chat interface where users can interact with their starred repositories naturally:

The project utilizes a robust architecture integrating Timescale, pgai, and Ollama:

Tools Used

Frontend

Next.js 14: Latest version of the React framework for building the web interface
TypeScript: For type-safe code
TailwindCSS: For styling and responsive design
NextAuth.js: Handling GitHub OAuth authentication
WebSocket Client: Real-time updates during repository ingestion

Backend

FastAPI: Modern Python web framework for building the API
WebSocket: Real-time connection for providing ingest phase status updates
Poetry: Python dependency management

AI and Vector Search

pgai Vectorizer: Implemented to generate embeddings for repository content using the following configuration:

SELECT ai.create_vectorizer(
  'public.repositories'::regclass,
  embedding=>ai.embedding_openai('text-embedding-3-small', 1536, api_key_name=>'OPENAI_API_KEY'),
  chunking=>ai.chunking_recursive_character_text_splitter('readme'),
  formatting=>ai.formatting_python_template('name: $name url: $url content: $chunk')
);

AI Extensions: The project utilizes multiple Timescale extensions:
- ai extension for core AI functionality
- vector extension for similarity search
- vectorscale for scalable vector operations
Ollama: Used for generating natural language responses based on retrieved repository content, specifically utilizing the llama3 model.

Database & Infrastructure

TimescaleDB: PostgreSQL-based database with vector search capabilities
GitHub API: For fetching starred repositories and README content

Final Thoughts

Building StarSense has been an exciting journey in combining modern AI technologies with practical developer tools. The integration of pgai's vectorizer with Ollama's language models creates a powerful synergy that makes repository discovery feel natural and intuitive.

Some key learnings and highlights:

The pgai vectorizer dramatically simplified the embedding process by:
- Automatically handling document chunking and preprocessing
- Managing embedding generation and storage
- Eliminating the need for separate embedding infrastructure
- Seamlessly integrating with existing PostgreSQL workflows
Timescale's AI extensions provided a robust foundation for vector operations
Ollama's open-source models offered great performance for natural language generation
The WebSocket implementation enabled real-time feedback during the repository ingestion process
The combination of Next.js 14 and FastAPI created a performant and developer-friendly stack

This submission qualifies for the following prize categories: