Add check-spelling to a repository

jsoref

Josh Soref

Posted on December 3, 2021

Add check-spelling to a repository

Everyone wonders why I care about spelling. Sometimes the rewards are 🍺:

This typo is epic, you just made my day! 🤣

I never would've imagined that a MR with hundreds of typo fixes would have any real impact, but you made it happen. It turns out the EDNS keepalive module didn't work because of this typo (and no one noticed...) And you fixed it, thank you!

We owe you a beer! 🍻

Originally posted by @tomaskrizek in https://github.com/CZ-NIC/knot-resolver/pull/75#discussion_r752569877

Sadly, the spell checker shows that there just aren't enough 👀 ("Given enough eyeballs, all bugs are shallow."):

Oh wow! This is a lesser used portion of the Chocolatey application, but I would have still expected for this to have been caught before now. This change makes sense to me.

Originally posted by @gep13 in https://github.com/chocolatey/choco/pull/2466#discussion_r755506047

My Workflow

https://github.com/check-spelling/spell-check-this/blob/main/.github/workflows/spelling.yml

The workflow is built around

GitHub logo check-spelling / check-spelling

Spelling checker action

@check-spelling/check-spelling GitHub Action

Overview

Everyone makes typos. This includes people writing documentation and comments but it also includes programmers naming variables, functions, apis, classes and filenames.

Often, programmers will use InitialCapitalization, camelCase ALL_CAPS, or IDLCase when naming their things. When they do this, it makes it much harder for naive spelling tools to recognize misspellings, and as such, with a really high false-positive rate, people don't tend to enable spellchecking at all.

This repository's tools are capable of tolerating all of those variations. Specifically, w understands enough about how programmers name things that it can split the above conventions into word-like things for checking against a dictionary.

Spell Checker GitHub Actions

Spell checking

Sample output

Comment as seen in a PR

github action comment

Comment as seen in a commit

github action annotation

GitHub Action

Marketplace Actions Check spelling

Submission Category:

Maintainer Must-Haves

Yaml File or Link to Code

The workflow has a couple of interesting pieces:

  1. It can check both a push and a pull request.
  2. If there's an existing pull request, the action will skip checking the push to save computing resources.
  3. Permissions are reduced during the check phase for security reasons.
  4. It's possible to only check changed files.
  5. The action generates outputs which enables one to consume them in ways I couldn't otherwise imagine.
  6. There's a separate comment phase as commenting requires additional permissions. (This is the default way that the workflow consumes the action's outputs and then sends them back to the action.)
  7. It's possible to ask the workflow to update its own metadata.
  8. When it updates the metadata, it collapses its original comment and your note to update the metadata.
  9. I recently added a note as part of this last phase linking users to a file they can edit in order to trigger a new validation pass.

Journey

I've been working on this tool for a while.

You can use it just for your documentation (as PowerDNS does), or you can use it for your entire project.

In the second half of this year, I've significantly improved its performance (it wasn't bad for small to medium repositories, but it wasn't good enough for giant ones, it's now doing reasonably well there), including allowing concurrency at the process level and via matrix runs.

I've also been adding additional heuristics, such as recognizing when a file really isn't worth checking -- often it will identify translation files and binaries as things to skip. I ran across a couple of projects with source code files that were >10 MB, so there's logic to skip such files by default (you can tune the threshold).

There are now heuristics to identify supplemental dictionaries one could use to reduce the size of the in-repository metadata. (The heuristics run if there are unrecognized terms, but you can turn them off if you don't want them.)

I'm slowly working on adjusting the action so that it could check other languages (hopefully next year).

I regularly feed projects to the template and am growing a list of pattern templates to make it easy for projects to trim out noise.

For medium-sized projects, the tool will regularly find bugs, whether it's a broken public API or a test that isn't testing what it thought it was testing. These are all normal occurrences. Consistent spelling helps projects avoid such pitfalls.

Hooks

Auto-detecting dictionary words

Because the workflow is customizable, I'm playing with instance-specific customizations such as this one for ohmyzsh/ohmyzsh. As ohmyzsh is built around aliases for zsh, this additional commit would automatically recognize aliases before the spell checker runs and add them to the dictionary. This means that if the alias is used in the documentation (and hopefully it is!), it'll be automatically accepted as a word, and when someone misspells an alias, that misspelling will stick out.

https://github.com/ohmyzsh/ohmyzsh/blob/e6657a8b5524c0a4893bd6bcbf9cf0234a07e155/.github/workflows/spelling.yml#L33-L40

    - name: find aliases
      run: |
        for a in $(git ls-files|grep '\.zsh$'); do
          echo "-- $a"
          if [ -s "$a" ]; then
            perl -ne 'next unless s/^alias ([A-Za-z]{3,})=.*/$1/;print' "$a" | tee -a .github/actions/spelling/allow.txt
          fi
        done;
Enter fullscreen mode Exit fullscreen mode

This logic:

  1. Looks for .zsh script files.
  2. Reports the name of the file from which it's going to be getting terms.
  3. Ensures there's a file with content (there's currently a quirk in act where a file tracked by git that matches .gitignore will not be copied into the act environment).
  4. Looks for lines that start with alias and have at least 3 characters.
  5. Reports each item and adds them to the dictionary.

I've submitted this as ohmyzsh/ohmyzsh#10475.

Neat things

GitLab via act

I know at least one project that runs check-spelling using nektos/act in GitLab. Because of the support for outputs, it would be possible for the act workflow to take the outputs and wire them to an equivalent commenting GitLab mechanism.

Recent deployments

dev.to

This blog is hosted by dev.to which is a deployment of:

GitHub logo forem / forem

For empowering community 🌱


Forem 🌱

For Empowering Community

Travis Status for forem/forem GitHub commit activity GitHub issues ready for dev GitPod badge

Welcome to the Forem codebase, the platform that powers dev.to. We are so excited to have you. With your help, we can build out Forem’s usability, scalability, and stability to better serve our communities.

What is Forem?

Forem is open source software for building communities. Communities for your peers, customers, fanbases, families, friends, and any other time and space where people need to come together to be part of a collective See our announcement post for a high-level overview of what Forem is.

dev.to (or just DEV) is hosted by Forem. It is a community of software developers who write articles, take part in discussions, and build their professional profiles. We value supportive and constructive dialogue in the pursuit of great code and career growth for all members. The ecosystem spans from beginner to advanced developers, and all are welcome to find their place…

So, I set up a deployment of check-spelling for forem: https://github.com/check-spelling/forem/actions

It doesn't take much work to convert the output into a list of words to correct (I use Google Sheets to generate corrections):

Thanks for all the work in this PR @jsoref ! Since this is quite a big PR, could you maybe break it down into smaller ones? I'd really like to get some things that are actual bugs (like this one) out faster.

Originally posted by @citizen428 in https://github.com/forem/forem/pull/15670#discussion_r762683014

@jsoref The most important one to me would be everything that is

  1. actual code, not a comment (currrentTime, onkeykup, etc.)
  2. not in a migration

Once we have that out of the way we can think about how to slice and dice the rest.

Originally posted by @citizen428 in https://github.com/forem/forem/pull/15670#discussion_r762689579

That was quickly merged. I've now updated the remainder and split it into two more pieces...

microsoft/PowerToys

I got a ping about some items that weren't being matched in

GitHub logo microsoft / PowerToys

Windows system utilities to maximize productivity

Microsoft PowerToys

Hero image for Microsoft PowerToys

How to use PowerToys | Downloads & Release notes | Contributing to PowerToys | What's Happening | Roadmap

Build status

Architecture Main Installer (Stable) Installer (Main)
x64 Build Status for Main Build Status for Stable Build Status for Installer
ARM64 Currently investigating Issue #490

About

Microsoft PowerToys is a set of utilities for power users to tune and streamline their Windows experience for greater productivity. For more info on PowerToys overviews and how to use the utilities, or any other tools and resources for Windows development environments, head over to docs.microsoft.com!

Installing and running Microsoft PowerToys

Requirements

Via GitHub with EXE [Recommended]

Microsoft PowerToys GitHub releases

So, I made a quick update for it.

Additional Resources / Info

There are a number of deployments including:

💖 💪 🙅 🚩
jsoref
Josh Soref

Posted on December 3, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related