Changing Your Repo's Language in GitHub

katkelly

Katherine Kelly

Posted on May 29, 2020

Changing Your Repo's Language in GitHub

I recently organized my pinned repositories on GitHub and noticed that the language shown for one of my repositories didn't quite seem right. It indicated HTML but I was expecting JavaScript because it was a vanilla JavaScript frontend and there were more lines of JavaScript code than HTML.

gif

To really set the scene, here's a screenshot of my pinned repos with the incorrectly labeled repo (IMO) in question:
repo-html

I did some digging to figure out how GitHub determines the language for the repository as well as looking at how I can change the language shown.

GitHub and the Linguist Library

GitHub indicates it uses the open source Linguist Library to determine the file language for syntax highlighting and repository statistics.

Once you push changes to a repository on GitHub, the Linguist does its thing with a low-priority background job that will go through all of the files to determine the language of each file. Some things to note:

  • all of the languages it knows about are listed in languages.yml
  • excluded files include binary data, vendored code, generates code, documentation, files with either data (ie SQL) or prose (ie Markdown) languages, and explicit language overrides.

To determine the language for each remaining file, the Linguist employs the seven strategies listed below, done in the same order. Each step will either identify the exact language or will reduce the number of possible languages that get passed down to the next strategy.

  • Vim or Emacs modeline
  • commonly used filename
  • shell shebang
  • file extension
  • XML header
  • heuristics
  • naïve Bayesian classification

The results are then used to produce the language stats bar that shows the languages and its respective percentages that make up the repository. The percentage is determined by the bytes of code for each language as indicated by the List Languages API. The language shown for all of my pinned repos up top is the majority language.

Also, I was today years old when I found about the language stats bar. If you’re wondering where it is, it’s the colorful bar up at the top of your repository just under the commits/branches/etc. bar. Those colors indicate the languages that make up your repo, and click on it to get the full breakdown. 🤯

language stats bar

Changing the Repo Language Shown

Now that we know the background of how GitHub determines the repository language, I’ll show you how to change the language shown using gitattributes.

  1. Create a .gitattributes file in your repo at the top-level
  2. Edit the file and add the below line, subbing in the language(s) you want ignored denoted by its file extension before linguist-detectable=false. Since I want HTML ignored, I’ve included HTML below.

    *.html linguist-detectable=false
    
  3. Add, commit, and push the changes

And voila, the language is changed to JavaScript!

repo javascript

Resources
About Repository Languages
Linguist
How Do I Change the Category?

💖 💪 🙅 🚩
katkelly
Katherine Kelly

Posted on May 29, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related