Parsing Some TOML

udayrana

Uday Rana

Posted on October 4, 2024

Parsing Some TOML

This week I worked on adding a .TOML config parsing feature to a tool called Scrappy.

Scrappy



Scrappy is a command line tool that will convert any website that can be scraped into a markdown.

GitHub Stars GitHub Contributors License GitHub Issues

How to use scrappy

  1. Download the repo.
  2. Run the following commands with the updated path variable that points to the location of the repo, run with sudo if there is a permission issue.
npm i
chmod +x /<PATH>/Scrappy/src/args/command.js
ln -s /<PATH>/Scrappy/src/args/command.js /usr/local/bin/scrappy
  1. You will need groq API key to convert from page to md. Once you obtain you key, run the following command to update the key in you system.
scrappy --api-key <YOUR_API_KEY>
or
scrappy --a <YOUR_API_KEY>

Config

To set default options and arguments, you can create a .scrappy.toml file in your home directory ~/ with the following config options:

url = "some_url"
inputFile = "some_input_file"
outputFile = "some_output_file"
tokenUsage = true | false
stream = true | false

Features

  • Input: The main feature is that you can convert any…

The tool is authored by my classmate Krinskumar, or Krins for short.

The Programming

The options I added support for in the config file were: an input URL, an input file, an output file, toggling streaming, and toggling token usage data.

I also updated the README with information on how to use the config file.

feat: add support for `~/.scrappy.toml` config file #9

Description

Fixes #8. Enables parsing of a config file at ~/.scrappy.toml with support for the following keys:

  • url
  • inputFile
  • outputFile
  • tokenUsage
  • stream

Summary of Changes

  1. Installed npm package smol-toml

  2. Added an async function parseConfig(configFilePath) to handle parsing the config file.

    /**
     * Parses a TOML config file for options
     * @param {string} configFilePath Path to .toml config file
     * @returns {{ url: string | undefined, inputFile: string | undefined, outputFile: string | undefined, tokenUsage: boolean | undefined, stream: boolean | undefined }} An object containing the parsed options
     */
    export async function parseConfig(configFilePath) {
      const configfileContent = await fsP.readFile(configFilePath, {
        encoding: "utf8",
      });
      const configOptions = TOML.parse(configfileContent);
      return configOptions;
    }
    Enter fullscreen mode Exit fullscreen mode
  3. Made validateArgs() an async function so it can call the above function. Added a try/catch around the call which catches parsing errors but lets the program continue if the file doesn't exist.

  4. Modified argument/config parsing flow:

    1. Use URL CLI option if present
    2. if missing, use URL config option if present
    3. if missing, use input file argument if present
    4. if missing, use input file config option if present
    5. if missing, exit
  5. Added comments explaining above flow

  6. Added config section to README

Notes

I used the module fs:promises in order to take advantage of async/await syntax, while the rest of the program uses fs. Let me know if you'd like this changed.

</div>
<div class="gh-btn-container"><a class="gh-btn" href="https://github.com/KrinsKumar/Scrappy/pull/9">View on GitHub</a></div>
Enter fullscreen mode Exit fullscreen mode


The hardest part was understanding the control flow for parsing arguments since Krins's program manually handled argument parsing with conditionals as opposed to working with a library like yargs or commander. I wrote it out to help me understand, and added comments explaining it.

Another thing I ran into trouble with was the fs module in Node. I tried using async/await syntax with it because I remembered doing the same for my project, but it wouldn't work. Turns out I was confusing it with the fs/promises module. Although Scrappy already imported fs, I went ahead and imported fs/promises anyway because it's easier to work with, and left a note in the pull request about the new import.

I had a decent idea of what I needed to do because somebody else had already added the feature to my project. For example, since the person contributing to my project used fs/promises too (because my code already used it), I knew what code the error thrown would have if the config file wasn't found.

I also used a JSDoc style comment for the function I added. I saw them in another assignment I've been working on and wanted to learn how they work so I could comment my code better. This enabled me to add a description, and specify parameter and return value types, which would be recognized by VSCode's IntelliSense, making it easier to work with. I think it's super helpful being able to view how a function works by hovering over it in the editor, so adding that capability to code I worked on felt good.

Using git remote

I also received a pull request from my classmate Harshil to add the feature to my project.

This week, I learned how to use git remote add to add the contributor's fork as a remote to my local repository. In the past, when working with other people's forks, I'd always clone them to a separate folder, but when trying this method while reviewing this pull request, I realized doing it this way was much faster and fairly straightforward. I'll definitely be using this method in the future - it should help save quite a bit of time.

That's it for this post, thanks for reading!

💖 💪 🙅 🚩
udayrana
Uday Rana

Posted on October 4, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related