Changing Your Repo's Language in GitHub
Katherine Kelly
Posted on May 29, 2020
I recently organized my pinned repositories on GitHub and noticed that the language shown for one of my repositories didn't quite seem right. It indicated HTML
but I was expecting JavaScript
because it was a vanilla JavaScript frontend and there were more lines of JavaScript code than HTML.
To really set the scene, here's a screenshot of my pinned repos with the incorrectly labeled repo (IMO) in question:
I did some digging to figure out how GitHub determines the language for the repository as well as looking at how I can change the language shown.
GitHub and the Linguist Library
GitHub indicates it uses the open source Linguist Library to determine the file language for syntax highlighting and repository statistics.
Once you push changes to a repository on GitHub, the Linguist does its thing with a low-priority background job that will go through all of the files to determine the language of each file. Some things to note:
- all of the languages it knows about are listed in languages.yml
- excluded files include binary data, vendored code, generates code, documentation, files with either
data
(ie SQL) orprose
(ie Markdown) languages, and explicit language overrides.
To determine the language for each remaining file, the Linguist employs the seven strategies listed below, done in the same order. Each step will either identify the exact language or will reduce the number of possible languages that get passed down to the next strategy.
- Vim or Emacs modeline
- commonly used filename
- shell shebang
- file extension
- XML header
- heuristics
- naïve Bayesian classification
The results are then used to produce the language stats bar that shows the languages and its respective percentages that make up the repository. The percentage is determined by the bytes of code for each language as indicated by the List Languages API. The language shown for all of my pinned repos up top is the majority language.
Also, I was today years old when I found about the language stats bar. If you’re wondering where it is, it’s the colorful bar up at the top of your repository just under the commits/branches/etc. bar. Those colors indicate the languages that make up your repo, and click on it to get the full breakdown. 🤯
Changing the Repo Language Shown
Now that we know the background of how GitHub determines the repository language, I’ll show you how to change the language shown using gitattributes
.
- Create a
.gitattributes
file in your repo at the top-level -
Edit the file and add the below line, subbing in the language(s) you want ignored denoted by its file extension before
linguist-detectable=false
. Since I want HTML ignored, I’ve included HTML below.
*.html linguist-detectable=false
Add, commit, and push the changes
And voila, the language is changed to JavaScript!
Resources
About Repository Languages
Linguist
How Do I Change the Category?
Posted on May 29, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 30, 2024