To do this, I chose Jest because it is the most popular testing framework for JavaScript and it is a mature technology, meaning there is plenty of great documentation and examples and a large ecosystem around it.
An alternative I was considering was Vitest, because last time I tried using Jest I had to figure out how to deal with TypeScript and ES modules. I was talking to my friend Vinh who wanted to set up Jest on his project too, and his project used ES module syntax, and so he ran into trouble since Jest support for it is still experimental. But since my project uses CommonJS syntax and Jest is still much more widely used, I decided to stick with it.
My program uses the OpenAI client to interface with various LLMs, including OpenRouter, Groq, and GPT. To test this functionality, we were encouraged to use an HTTP mocking library like Nock, but it made more sense to me to mock the OpenAI client in Jest using jest.mock. In order to do this, I had to move the initialization for the OpenAI client to a separate file so that the instance could be imported and mocked in tests.
Learning from tests
While writing my tests, I ended up learning a few things about how my own code worked. For example, I was passing multi-line template literals in my prompt to the LLM, and when testing the prompt building function, I learned that all of the indentation and newlines were being passed to the LLM. After a bit of research I learned the newlines can be escaped with a \ like in a Unix shell, but as for the indentation, there's not much of a choice except to remove all indentation from the literal.
I was using node:fs.stat() to check whether a config file existed before parsing it with node:fs.readFile() and it turned out this was redundant because they both throw the same error if the file doesn't exist.
I have a module that selects a default model based on a provider base URL specified in an environment variable. While writing tests for it I was confused by my own logic writing the module. After thinking about it from the perspective of possible test scenarios, I was able to simplify the logic a fair bit which also made it much easier to understand.
Also, when trying to set values on process.env before each test in order to test the model selection module, I noticed that values on process.env that were set to undefined or null would evaluate as truthy. I'm not sure why, but I got around this by deleteing the values before each test.
I was worried that testing streamed responses from the LLM would be difficult, so without attempting it, I decided to make non-streamed responses the default option and create a flag to request a streamed response. But I was able to test streamed responses successfully - for most tests, I was able to use arrays instead of streams. To test for when reading the stream fails, I used a generator function.
test("Should throw if error occurs reading response stream",async ()=>{consterrorCompletion=(asyncfunction*(){yieldnewError("Stream error");})();constexitSpy=jest.spyOn(process,"exit").mockImplementation();awaitwriteOutput(errorCompletion,"output.txt",true,true);expect(exitSpy).toHaveBeenCalledWith(23);exitSpy.mockRestore();});
While writing tests for the function that handles writing output, my tests helped me catch another edge case I'd missed while hastily making streamed responses optional: handling token usage for non-streamed responses.
if (streamResponse){awaitprocessCompletionStream(completion,outputFilePath,tokenUsageRequested,tokenUsage,);}else{// Forgot this part until I realized while writing my tests!const{prompt_tokens=0,completion_tokens=0,total_tokens=0,}=completion?.usage||{};tokenUsage={prompt_tokens,completion_tokens,total_tokens};if (outputFilePath){awaitfs.writeFile(outputFilePath,completion.choices[0].message.content);}else{process.stdout.write(completion.choices[0].message.content);}
Conclusion
Even though I've done testing before in other courses and was already familiar with Jest, I'd never read the docs thoroughly or used the mock functionality (working with servers in the past, I'd used superagent), so I learned a lot working on these tests. I think mocking and setup/teardown are incredibly useful features to have when writing tests.
I find testing to be invaluable to ensure no regressions are made to the codebase especially when working on a large project or in a team and it can save tons of time. For my own projects, I like to perform test-driven development, and intend to continue doing so in the future.