Y.K. Goon
Posted on June 19, 2024
I was trying to catch a glimpse of the future of programming alongside LLMs. I think I ended up discovering a whole new art.
When you bring up AI and coding assistants, most people think of GitHub Copilot and similar alternatives. By now I can confidently say this: code completion is not the future. At best, it's just cute.
The discussion around this among seasoned coders is complicated. It's not that we're unwilling to use Copilot. But when we know our turf well, having a code-completion as assistant gets in the way half of the time. So much of it requires learning a different workflow to complement our existing one. By the time I've explained enough to the machine, I could've coded the solution myself. It's unclear if adopting a different workflow is worthwhile.
On the other hand, there's a sense that a lot of the reluctance comes from the ignorance of what these tools can achieve. Similar to learning vim-keybindings. I hesitated for many years. But once I've suffered through the learning curve, I swear by it.
So I put in some time to explore something entirely different. Instead of code-completion tools, I looked at coding assistants that live up to its true meaning. I narrowed the field down to two: Mentat and Aider.
Mentat
I tried Mentat first, a seemingly smaller project of the two. The demo looks promising, you should take a look first.
It's a terminal-based application. Installation via pip
is easy. It's made with Textual TUI so that's a nice touch.
The UX had me at hello. It doesn't try to code with me in Emacs. Instead, I tell it what I want and it will try to deliver in the right places across the project.
To get it to work, I hooked up Mentat to use a coding LLM by Phind hosted by Together AI.
Next I have to pick a problem domain. This is my first mistake: I tried using it to solve a bug in my day job. It's made to work on a code base that is 9 year-old by now.
That broke any available context window limit from the get-go.
See, when working with Mentat we get to specify the relevant files to work on. Code changes by the machine would happen on those files. These files get submitted to the LLM as context (possibly on top of git logs too).
A single Python test file of mine run up to 3,000 lines, easy. No LLM would want to entertain that.
This obstacle got me thinking about fine-tuning a model with my entire code base; or some solution involving RAGs. This can get quite involved; it feels premature. But before I get there, I might as well try Aider first. I shall circle back to Mentat in the future.
Aider
Watch the demo first.
The UX and concepts involved here are similar to Mentat. The difference though is Aider supports Google's Gemini, which has the largest context window out there. If it can't handle my code base, nobody can.
And indeed it could not. I did the setup (similarly with pip
), worked on the same files from my large code base and Gemini refused to return anything at all.
By now I think I'm making it do things it's not designed to. Most demos like this start idealistically, without the burden of a 9-year-old code base. So I pulled something out of my idea bank (things I wanted to code but never got to it) and made Aider code it from scratch. Now Aider worked as advertised.
This project is a web browser extension that's meant render web pages from within a 3D scene, made to be used within a VR device. The details of this application are immaterial. What matters is it make use of Three.js and various pieces of Javascript stack, something I'm not invested in and therefore out of my depth.
From the get-go Aider created the entire set of boilerplate files, enough for it to work as an empty browser extension. I subsequently spent the whole day working with Aider to get the project to a point where it successfully integrated Three.js.
Now I can start reflecting on the experience.
How it's really like
Without Aider, a substantial amount of my time would've been spent shaving yak. That include setting manifest files by hand, configuring, doing it wrong and Googling back and forth. All these are low value work, make sense to be done by machines. I wouldn't have taken the project this far in one day coding it myself.
Real action takes place after the first hour. I made a point of telling it what I want like I would to a junior coder, sparing it from making assumptions. That worked out well.
When it gets things wrong, it needs help correcting its own mistakes. Chances are it's because I was not specific about what I was asking for.
When Aider did something unknowingly wrong, I didn't know enough to correct it and assumed it's correct. Further work is built on top of that mistake and cascade into larger mistakes.
There are two facets to mistakes. When Aider makes mistakes on its own, it needs human's help in pointing them out. Doing so involves being specific about the solution. Just saying the outcome is wrong is not helpful.
Secondly, the reason I was not specific enough about my request was because I didn't know enough about the intended solution to ask for it. Therefore Aider does not free you from knowing your stack and technical intricacies.
About testing. This is highly domain specific. Had I been doing backend work, I would've had Aider code my test cases for me. However mine is a VR project, so it's still down to me to test by clicking on browser. I think it most projects, Aider will end up encouraging a test-driven approach by making test cases easy to create.
With coding assistants, it's not the case where you ask for the result and it will deliver the solution. For any non-trivial problem, you would have to iterate with it to come to the right solution. So before machines can reason on their own, human is the reasoning component in this loop.
Like most new skills, learning to get good at working with coding assistants will make you slower before it makes you faster.
Which leads me to declare this: AI-assisted coding is an entirely different art. It's not better than classical coding (I have to coin that here); it's not worse either. It's different like Judo and Muay Thai; comparison is unfair without context.
Classical vs Assisted
Now that I've established two different approaches to coding, I can now engage in some speculation.
Here's an easy one: assisted coding works well on popular programming languages (simply because LLMs are well-trained on them). Projects in artisanal languages (let me introduce you to Hoon) have no choice but to be handcrafted the classical way.
Classical coders are about how; assisted-coders are about what. Consequently, assisted projects achieve objective faster but classical projects maintain better.
Should any given software project in the future be done with a mixture of assisted approach and classical? I suspect no. In that if a code base is assisted code to begin with, there should be minimal classical intervention.
Conversely a classical code base should not be tainted by assisted code commits. Even if this has no quality implication, I think it will be socially demanded by team members.
I can't qualify this point beyond falling back to my intuition, but this aspect will be interesting to observe.
I wonder how collaboration works differently for an assistedly-coded project. Would problems in a typical FOSS project still exist? If not, is the same pull request workflow of a classical project still relevant?
The final point is how physical limits of LLMs affect engineering approaches. Let's assume there will always be a limit to context windows in LLMs no matter how much fine-tuning and RAGs are pulled.
I think assisted projects are likely to discourage monoliths. Because LLMs couldn't fit a big monolith its figurative head, humans go around it by breaking it into pieces. The result end up looking like microservices, whether the problem domain demands for it or not.
Some may argue that's universally a good thing. That remains to be seen.
Going forward
This will be an ongoing research. I'm hopeful to see my toy project to the end.
I may try Mentat again on a new project at some point.
Posted on June 19, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 28, 2024