This week, as part of my class on Open Source Development, I worked together with my classmate Mayank to contribute to each other's projects. Mayank's project, dev-mate-cli, is a command line (CLI) tool that reads source code files using AI and generates comments to make it easier to understand. For reference, my project, codeshift, is also a CLI tool that uses AI, except to translate source code files to other programming languages. Our task was to add an option to each other's tool that displays the number of tokens used to transform the source files. Tokens, in this context, refer to the chunks of text processed by AI to generate meaningful output.
In order to contribute to each other's projects, we were asked to file issues and pull requests. I had some prior experience with creating issues and pull requests, but this week, I became very familiar with the process.
Table of Contents
My Contributions
The contributions I made to Mayank's project were:
Implementing the --token-usage feature
Importing program metadata for --version and --help from package.json
Filing issues for potential improvements
Reading Mayank’s Code
Before creating an issue for the feature, I began by reading Mayank's code. Mayank used TypeScript for his project, which I'm not overly accustomed to, but have used a little bit of here and there. His code was also split into several modules, a paradigm I found intimidating at first. I figured a good starting point would be to find the file where the API call to the AI was made, since I'd need the data returned from it. Aside from the feature I had to add, I noticed a few improvements that could be made to the project, and noted them down to create issues later.
Adding --token-usage to dev-mate-cli
To create his CLI tool, Mayank used the same npm module as me - commander.js. commander.js is a module for creating CLI tools with features like option and argument parsing. This made it easy for me to jump in and identify exactly where I needed to add my code. I had to register the -u/--token-usage option, check if it was passed, and if it was, print the token usage property returned by the API. Simple enough. I also made sure to stick as closely as possible to the project's code style.
I created the issue on GitHub and felt satisfied, but after seeing how detailed some of the issues other people created were, I feel like I should have gone into more detail. My pull request's description could have included additional context to explain the changes better.
In the future, I’ll ensure to add more detail explaining the context, expected behavior, and test cases for the changes, so collaborators can better understand the purpose of the pull request.
Add a new command-line flag: --token-usage or -t. When the program is run with the --token-usage/-t flag set, extra information will be reported to stderr about the number of tokens that were sent in the prompt and returned in the completion.
I'd like to work on this feature. Please let me know if you have any specific implementation guidelines.
After submitting my pull request, Mayank approved it quickly, but since the lab required practicing requesting changes to pull requests, I submitted another pull request based on one of the potential improvements I noted earlier, which was importing the program name and version from the package.json file for printing when using the --version flag. This way, when updating them you wouldn't have to do it across multiple files. I turned in the pull request, being unsure what I'd be asked to change, and Mayank made the excellent observation that the program description used in the help message could also be imported from the same file, and asked me to add the change to the pull request, which I did.
For this change I had to edit the TypeScript configuration file tsconfig.json to allow importing .json files. It was fun learning a little bit more about how TypeScript works and how it can be configured.
Results
I was satisfied with my work on the --token-usage feature, but I realized later that it wasn't working as intended. Instead of printing the total tokens used for the entire command at the very end, it would print the tokens used after each input file was processed. This implementation works, but it's not quite what I envisioned. I let Mayank know, and thankfully, he seemed okay with the implementation.
Filing More Issues
I also filed a few other issues on the project:
Suggested adding a .env.example file with placeholder environment variables for easier setup.
Proposed moving the list of future enhancements from the README to GitHub issues.
Noted that the project's --version command didn’t print the program name, which I fixed in my second pull request.
Having integrated the --token-usage feature in Mayank's project, it was time to have him contribute to my project. He helped out by contributing the following:
Added the --token-usage feature.
Fixed code style inconsistencies based on my feedback.
Addressed major bug with output file overwriting.
Adding --token-usage to codeshift
Mayank's changes for the most part were pretty good, barring some code style inconsistencies, which I asked him to change when reviewing the pull request. His implementation of --token-usage was similar to mine, meaning the output was logged after each file. Since this was unintentional on my part, I ended up changing the code to work as I originally intended. Mayank was kind enough to also work on a couple other issues I had open.
This pull request adds a new --token-usage flag (with a shorthand -t option) to the program, allowing users to see the token usage information when making requests to the API. This flag enables the program to display the number of prompt tokens, completion tokens, and total tokens used by the request.
Changes Made:
Added both a long flag --token-usage and a short flag -t to the program for reporting token usage.
Added logic to check for the --token-usage flag in the program. If the flag is present and the response contains token usage data, the program will now extract and display it in the console using console.error.
The relevant data is extracted from chunk?.x_groq?.usage in the response, following GROQ API's response structure.
Updated the README.md file to document the new --token-usage (-t) option.
Notes:
No breaking changes were introduced in this pull request.
This PR closes #10 by adding the requested token usage option.
Please let me know if any there are any additional changes required.
This pull request modifies the program to append data to the output file when multiple input files are provided and the --output flag is used. Previously, the program would overwrite the output file with each input file’s data. Now, it appends the results from each input to the specified output file.
Changes Made:
Updated the logic to use fs.appendFile instead of fs.writeFile when multiple input files are passed along with the --output flag. This ensures that data from all input files is written sequentially to the same output file, rather than overwriting previous content.
Added logic to output a warning message if the output file is not empty. File is checked using fs.readFile for empty check.
This fixes #12 , let me know if further changes are required.
Mayank's Pull Requests were clear and straightforward. It was easy to understand the intentions behind the changes he made.
Fixing Output File Overwriting Bug
I was glad Mayank decided to help out and take on additional issues. He helped me identify a major bug in my program: When specifying a file to output to, the output from the last input file would overwrite that of the previous files. He fixed this by making it append to the specified output file, and also added a console warning message.
When creating my project, I aimed for the tool to behave like Unix commands (mv and cp), which remain silent unless errors occur. While Mayank’s fix improved usability by adding feedback, I felt it diverged from the tool’s overall intended behavior. After reflecting on both approaches, I modified the program to preserve my original goal of overwriting content while incorporating Mayank’s fix.
Conclusion
Despite the challenges I faced, I’m proud of the experience I gained and the contributions I was able to make. These lessons will undoubtedly help me become a more effective open-source contributor in the future.
In summary, this experience gave me not only a deeper understanding of creating and reviewing pull requests, but also a better appreciation for the importance of careful code reviews and testing. I feel much more confident in reading and working with other people's code, but I need to be more critical of both the changes I make, and the changes I accept. I look forward to applying these lessons to future contributions, as I continue to grow both as a developer and a collaborator.