New Feature for caching LLM response with redis instance

fadingna

fadingNA

Posted on October 25, 2024

New Feature for caching LLM response with redis instance

First of all, this is my third pull request of Hacktoberfest contribution during October 2024, I have a chance to contribute in big project called DocsGPT, and the project overview is in this link below.

GitHub logo arc53 / DocsGPT

Chatbot for documentation, that allows you to chat with your data. Privately deployable, provides AI knowledge sharing and integrates knowledge into your AI workflow

DocsGPT 🦖

Open-Source Documentation Assistant

DocsGPT is a cutting-edge open-source solution that streamlines the process of finding information in the project documentation. With its integration of the powerful GPT models, developers can easily ask questions about a project and receive accurate answers

Say goodbye to time-consuming manual searches, and let DocsGPT help you quickly find the information you need. Try it out and see how it revolutionizes your project documentation experience. Contribute to its development and be a part of the future of AI-powered assistance.

link to main GitHub showing Stars number link to main GitHub showing Forks number link to license file link to discord X (formerly Twitter) URL

Production Support / Help for Companies:

We're eager to provide personalized assistance when deploying your DocsGPT to a live environment.


Let's chat

Send Email 📧

video-example-of-docs-gpt

Roadmap

You can find our roadmap here. Please don't hesitate to contribute or create issues, it helps us improve DocsGPT!

Our Open-Source Models Optimized for DocsGPT:















Name Base Model Requirements (or similar)
Docsgpt-7b-mistral Mistral-7b





Beginning

  • I began by checking out the issues listed in the repository. I found a feature request that the maintainer needed help with, and I thought, "Why not give it a shot?" So, I joined their Discord channel and started chatting with the maintainers about the new feature they wanted, as well as the coding style they preferred. Here's the issue I tackled

    🚀 Feature: Caching for DocsGPT #1295

    pabik avatar
    pabik posted on

    🔖 Feature description

    We need to implement caching for DocsGPT to improve performance and efficiency. If the same question is asked, using the same source and the same LLM, the result should be retrieved from the cache rather than triggering a new API call.

    Redis is already configured and used for Celery tasks, so the cache system should leverage Redis for storing and retrieving these cached responses.

    🎤 Why is this feature needed ?

    This feature will improve the performance of DocsGPT by avoiding redundant API calls for identical requests. It will reduce response time for repeated queries, lower API costs, and improve user experience, especially for frequently asked questions.

    When users ask similar questions repeatedly using the same data source, there's no need to re-run the same logic each time (at least for some time period). By introducing caching, we can streamline this process.

    ✌️ How do you aim to achieve this?

    The implementation will involve:

    1. Using Redis as the caching layer to store LLM responses, indexed by the combination of question, source, and LLM used.
    2. Checking the cache before executing new LLM queries to see if a cached result is available.
    3. Triggering from cache if an identical question is found in the cache, otherwise proceeding with the usual query process and then storing the result for future use.

    This is a challenging task, and we'd love to collaborate on it. You can contribute directly in this issue or join the discussion in our Discord (collaborative-issues).

    We also encourage splitting this issue into smaller, manageable tasks, but please link them back to this original issue for tracking purposes.

    🔄️ Additional Information

    No response

    👀 Have you spent some time to check if this feature request has been raised before?

    • [X] I checked and didn't find similar issue

    Are you willing to submit PR?

    None

    </div>
    <div class="gh-btn-container"><a class="gh-btn" href="https://github.com/arc53/DocsGPT/issues/1295">View on GitHub</a></div>
    
    Enter fullscreen mode Exit fullscreen mode



  • After some back-and-forth dicussions, I got a better understanding of what was needed. I also learned a lot about the project's structure. Its turn out the repository includes both UI and Server, plus a bunch of services like PostGreSQL, Redis, and SQL running in Docker containers. It was a bit overwhelming at first, but I got the hang of it.

    Caching docsgpt #1308

    What kind of change does this PR introduce? (New Feature Caching)

    The changes are applied in the BaseLLM class to ensure that all LLM queries (both standard and streaming) benefit from

    • Caching of responses to improve performance.
    • Token usage tracking for monitoring API costs.
    • The concrete LLMs implementations now automatically apply caching and token tracking without modifying their core logic.

    Why was this change needed? (You can also link to an open issue here)

    • #1295

    Other information

    • The addition of caching and token usage tracking was necessary to improve performance and reduce redundant API calls for LLM queries. This change also allows monitoring of token usage for better cost management. By caching the results of similar requests, repeated queries can retrieve cached responses, thus saving time and reducing API costs.

    Additionally, the use of decorators makes the code more modular, allowing the caching and token tracking logic to be applied across different LLM implementations without modifying each one.


  • Diving into the code

    • Not that long that I made myself comfortable with server architecture using Python3, and running server with Flask. It was a fun task, and I really enjoyed seeing how everything connected. from Abstract Object to Concrete and running by decorator function to apply all LLM instances.

    What I Learned

    This experience taught me a lot about handling server-side caching and working within a larger, collaborative project. Here are some key takeaways:

    • Communication is key: Talking directly with the maintainers helped me understand exactly what was needed, saving time and avoiding confusion.

    • Project structure matters: Understanding how different parts of the project interact, like how the server communicates with PostgreSQL and Redis, made a huge difference.

    • Patience is a virtue: Large projects can be complex, and it takes time to navigate them. But the satisfaction of making a contribution is totally worth it!

    • If I could go back, I will spend a bit more time exploring the codebase before diving into my changes. It would have made the process easier. But, and I would definitely keep the open more communication with maintainers it made everything so much easier, but also research more how to caching work in pararell timeline.

    Conclusion

    • The overall of this pull request is quite big that I have to organize the redis script to make LLM response with cache if the pattern of conversation is happened before. It was not just about writing code. I had to think about how to structure the cache efficiently, so that it could store and retrieve responses quickly. This meant considering different conversation scenarios and making sure that the cache would be hit only when it made sense, preventing unnecessary API calls.
💖 💪 🙅 🚩
fadingna
fadingNA

Posted on October 25, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related