This week's objective in our Open Source course was to contribute to someone else's code by implementing a new feature and submitting a Pull Request.
The feature we were tasked to implement is a new command-line flag, --token-usage/-t, allowing users to quickly check the LLM's token usage information.
Choosing a project to contribute to
This time around, I wanted to make the collaboration experience smoother, which led to me being more picky when choosing a repo to work on.
I was searching for a project written in the same language as my CLI tool, while also ensuring that the repository is actively maintained (having recent issue closures and code pushes) to increase the chances of getting timely feedback.
After going through my classmates' repos, I found a perfect candidate for contribution, a repo with clear code written in Python:
CodeMage is a tool that translates a source file written in one programming language into another language
The translation will be done by Large Language Model AI(such as ChatGPT)
Release 0.1
Features
1. Supported Languages: Javascript, Python, C++, Java
2. Default target Language is Python
After selecting the project repository and confirming that no one else was working on it, I started my contribution.
Here is an overview of the steps I took to contribute a code change.
Creating an Issue
First off, I familiarized myself with the project structure and filed a GitHub issue proposing to implement the --token-usage/-t option flag, giving a concise description and explaining how the feature can be implemented (Adding a feature: token info flag option #7).
Fork, Clone, Branch
Then, I have forked the repo, cloned the fork to my pc, and created a feature branch to avoid interfering with the owner's code on the main branch.
Writing the code to implement the change
Before making any changes, I've made sure to read through the project files, to get a complete understanding of the logic and get a sense of the code formatting style.
While studying the code, I found that Jin (the owner of the repo) was kind enough to add the code for parsing the --token-usage/-t option flag argument from the command line.
So, I just needed to figure out how to retrieve the token usage information from OpenRouter (LLM provider that Jin used).
I added the code to extract the token usage info from the completion object and print it to stderr if the --token-usage/t flag was provided.
The main issue I encountered was that token counts were always returning as 0.
After referencing the docs and printing out the completion object, I've realized that completion token details are not being returned by the chosen model (completion_token_details=None).
To address this, I added another check to see if completion token details are present in the response (to provide users with more comprehensive logging), made sure to mention this problem in my PR, and filed an issue to add a feature allowing users to pick a model (so they can select a model that provides token usage info in the completion response). (Adding a feature: --model flag option #9)
For more details about the changes I made, you can check out my Pull Request.
Updating the Docs
After finishing the code, I added the info about the new option at the bottom of the README.md.
Making a Pull Request
The moment of truth arrived when I submitted a Pull Request, thoroughly describing and explaining all the changes I had made (My Pull Request).
Not long after, I received feedback requesting to change the output style from print("-", file=sys.stderr) to sys.stderr.write(""). With a few quick adjustments, I pushed the updated code, and shortly afterward, it was successfully merged into the main branch.
Learning Outcomes
In this lab, I had the opportunity to go through the full workflow of contributing to an open-source project and making a pull request in practice and the overall experience was great :) !