2024-05-21: Trouble and Turmoil

armantark

Arman Tarkhanian

Posted on May 22, 2024

2024-05-21: Trouble and Turmoil

Last week was pretty hectic. We figured that the model was definitely overfitted, because we had written out data for only the beginnings of conversations, hoping that the "natural" instincts of GPT-3.5 would be able to take over after that point, but unfortunately it would continue questioning in an endless cycle. There were even spelling mistakes and other typos, which, while I did not notice while reviewing the data, may have ultimately come from there.

Another major issue was that the model had a relatively large validation loss. I suspect this is because I was forced to do a 90/10 split on training vs. validation, and I didn't want to do k-fold cross-validation because I figured 85 data points would be enough (and it would be difficult to code up, unless there's a library out there specifically for OpenAI's fine-tuning tool); regardless, 9 validation conversations isn't enough internal testing to minimize any sort of loss. The numbers ultimately came out to 0.8240 training loss and 1.7145 full validation loss (oof).

That meant that the data was definitely overfitted, which we could discern almost immediately considering the AI would not steer course even after something completely irrelevant was thrown its way (e.g. literally correcting its grammar).

Anyway, so we set out to create far more data points for a second round of fine-tuning as well as making sure that there is a. a lot of validation data, b. lots of full conversations, not just beginnings, and c. ~200 new conversations.

So I generated all that plus a bunch of new categories for validation and handed it off to the product team so they could start crafting all of that. As I write, I'm still waiting on them to finish it.

We also found that we need to do a crunch on getting v0 (what they're calling alpha) out to demo to investors. The CEO was really pushing for two new "architectures" (basically what we're calling our proprietary applied prompt+fine-tune sets) in addition to porting to iOS and, of course, making it have our brand voice. Needless to say, this cause a bit of a riot because it was a lot of work to squeeze in within the span of a week, plus still none of us have gotten paid yet.

With the rest of the team, we planned out what we needed to do for the coming week. I also met with our new "fractional integrator" who was supposed to help streamline organizational processes, which we were definitely lacking in. I talked with her about some things I wanted to see so we could optimize communication internally with the engineering team, since it was really lacking in that front. I'm not sure how much I've mentioned it, but most of the engineers are working out of Europe, particularly Germany, and they are very much against having actual synchronous meetings, partly because of the time zone difference but mostly because I feel like they don't want to be collaborative and just work alone.

Anyway, this has been causing a lot of issues. They also don't really write any documentation of any sort, so all the code we have is just written as-is and is hard to navigate, even with chucking in everything into ChatGPT to help explain stuff. So when I talked to our integrator, I told her that we need a code style guide that must be strictly followed, and that I would take on the responsibility of writing that.

Since I've had some kinda-sorta downtime with waiting on the rest of the product team to write up that data, I've been also writing that code style guide, trying to include as much detail as possible so things don't get overlooked. I want to apply this to code reviews as well, so it can be enforced.

The integrator and I also discussed removing Jira from our workflow because it feels too detached from the product team, and so I think we're in the process of keeping it all unified on Notion.

We also worked out a "sprint" system for situations like this where we're in a time crunch, but not velocity-based. Either way, we're racing against a clock so that our CEO can demo the app to some big-name investors, who aren't even technologically-inclined.

Finally, we discussed removing a bunch of useless channels on Slack because they were clogging up the sidebar immensely. That one is just a routine maintenance thing.

Things were really coming along great, though, until randomly on Friday our head research scientist left because of an argument she had with the CEO, so now we're trying to scramble around to find a replacement (most likely) while also missing the core person of our product team to help write training data and guide the architectures.

So hopefully things get smoothed over and she comes back, or we find someone new quickly who can be brought in the loop really fast. Thankfully, though, it did buy us a week to organize all this new stuff that the CEO is asking from us, so at least that's going for us.

Anyway, that's all for now. Until next time, cheers.

πŸ’– πŸ’ͺ πŸ™… 🚩
armantark
Arman Tarkhanian

Posted on May 22, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related