Can AI Help with Repository Base Code Understanding?

michal_kovacik

Michal Kovacik

Posted on June 19, 2024

Can AI Help with Repository Base Code Understanding?

Understanding and maintaining large codebases is a common challenge in software development, leading to significant time and resource expenditure. Addressing this issue is essential for improving developer productivity and reducing technical debt.

What is code? Code is a recipe for solving a concrete problem. With just the code, you can reverse-engineer to understand which problem it solves and how it does so. This reverse engineering allows you to formulate user stories describing the problem. From these user stories, AI can generate new code. Is this just theoretical, or can current technology help create tools to solve this problem?

In DTIT, particularly within AI4Coding, we’re thinking about technological debt and how to address it.

We start from the premise that the current state of AI systems is not able to offer the in-depth contextual understanding necessary for effective coding support at the repository level. Users of AI tools for code generation and completion often encounter reliability issues when dealing with larger codebases.

Our research indicates that RAG (retrieval-augmented generation) can be beneficial but has limits. Even concepts like Agentic with Chain of Thoughts or Tree of Thoughts are insufficient and can be costly. What else can help? Abstract Syntax Trees (ASTs) are useful, but they don’t provide a repository-level understanding of the code.

Current research shows that knowledge graphs excel in modeling complex relationships and dependencies within code across entire repositories. We utilize RAG, Agentic approaches, and ASTs, but knowledge graphs have been a game-changer for our product—Advanced Coding Assistant.

Why do we still have “assistant” in the title? Even though we are trying to use all known best approaches, keeping the developer in the loop is crucial.

So, my answer to my introductory theoretical question is YES, but we are not in the Harry Potter universe, and AI is not a magic wand, and you cannot expect a “one click” solution. However, providing developers with tools that enhance code understanding at the project level enables them to not only work faster but also tackle tasks that were previously unsolvable.

For more information, please read the articles by my colleagues:

https://medium.com/@cyrilsadovsky/advanced-coding-chatbot-knowledge-graphs-and-asts-0c18c90373be

https://medium.com/@ziche94/building-knowledge-graph-over-a-codebase-for-llm-245686917f96

Stay tune for more information. We will definitely share results from our research.

💖 💪 🙅 🚩
michal_kovacik
Michal Kovacik

Posted on June 19, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related