GitDesc: Extract ReadMe from GitHub

syncrofosatron

Neeraj Mishra

Posted on March 2, 2021

GitDesc: Extract ReadMe from GitHub

Hello there!

Welcome to my first post and my first ever tutorial.

I started learning web development last year, just HTML and CSS. Started learning again last month and I thought I'd learn best by writing about it as I learn and I am fairly new to this.
We will be making a website to extract the description from GitHub repositories, all we need is the link to the repository.

Note: I am using Hyper terminal for the execution of the commands and I am hoping that you have the latest version of Node.js installed as well as npm. You could use any other terminal, or even do things manually if you're such a masochist. XD
Alright now, let's get to work!

  • Make a directory or folder anywhere you like and name it anything you like. For more user-friendliness I'd be using the name: GitDesc.
    To create the folder, use command: mkdir GitDesc
    After you have done so, change your current directory to the GitDesc directory by using command: cd GitDesc.

  • Now it's time to initialize npm and install the required packages for our project. So go to the terminal and run the following commands to initialize and install packages.

  1. npm init
    You will be asked some details to fill about the project you
    are creating. If you don't want to fill anything, you can just
    press enter until it stops asking.
    Or there is another shortcut to automatically fill in the
    details using command npm init -y.

  2. npm install express --save
    This is a framework for creating the web application, which is
    just what we want. If you wanna learn more, here's the link: Express.

  3. npm install body-parser --save
    We will make a HTML file for the user to put the link of
    certain GitHub repository and this link would be transferred
    to the server-side JavaScript using this package.
    After that, we will make a connection request from that link
    to GitHub and the required ReadMe content will be extracted.

  4. npm install cheerio --save
    With its help, we will narrow down our required data into
    text. This is an API used for Web Scraping and that's what
    we are doing here too.

  5. npm install request --save
    This is required in order to make HTTP request to the website.

You can omit the use of --save

  • Now we will create two files, HTML file for the user-interface and a JavaScript file which would be responsible for the server-side scripting.
    touch command is used to make files and the name we will use are, "index" and "app".
    So do the following,
    touch index.html
    touch app.js

  • Open the project in code editor. I am using Atom and to open the project in atom via the command-line, we use command: atom . in the working directory. This will open up Atom with the project we are currently working on.

  • Open app.js file.
    First, we will have to import all the packages that we have installed so far.

const cheerio = require('cheerio');
const request = require('request');
const express = require('express');
const bodyParser = require('body-parser');
Enter fullscreen mode Exit fullscreen mode
  • Create a new app and set-up the body-parser.
const app = express();

app.use(bodyParser.urlencoded({extended: true}));
Enter fullscreen mode Exit fullscreen mode

For more information on body-parser, you can check this

  • Now as everything has been set-up and initialized, we can finally get down to the meat. But before that, we should think about how the interface is going to be.

    1. There will be a simple HTML page with an input box and a submit button.

    2. After pressing the button, the results will be displayed in the page.

  • Now we will send the whole HTML file as the GET request.

app.get("/", function(req, res)
{
  res.sendFile(__dirname + "/index.html");
})
Enter fullscreen mode Exit fullscreen mode
  • Below we are going to post the information we have received from GitHub to the webpage using POST.
app.post("/", function(req, res)
{
  // This is the URL we have received from the HTML file.
  const url = req.body.githubLink;

  // This is HTTP request to the website.
  // response is received from the website.
  // html contains the website's html.
  request(url, function(error, response, html)
  {
    // This command will load the HTML of the
    // repository and store it in a constant.
    const $ = cheerio.load(html);

    // This is the class under which the GitHub's ReadMe
    // information is put, we are going to extract that
    // information and will store it in a constant.
    const body = $('.markdown-body');
    res.write("<h3>" + "Description" + "</h3>");

    // To extract just the text from the body,
    // we are using body.text().
    res.write("<textArea cols=\"90\" rows = \"30\">" + body.text() + "</textArea>");
    res.send();
  })
})
Enter fullscreen mode Exit fullscreen mode
  • To start the server, we will do the following:
// You can use any port number you like.
// Most common is 3000.
app.listen(666, function()
{
    console.log("Server started...");
})
Enter fullscreen mode Exit fullscreen mode
  • Now for the final step, open up index.html and add the following lines within the <body> tag:
<form action="/" method="post">
  <input type="link" name="gitHubLink" placeholder="Enter link here...">
  <button type="submit" name="submit">Submit</button>
</form>
Enter fullscreen mode Exit fullscreen mode
  • Run the web application we have just made after starting the server by typing node app.js in the terminal.
    You'll see a log as, "Server started..." which means, the server is listening at port you have specified and there are no errors.

  • Open your web browser and type in localhost:666/, don't forget to change the port number with the one you have specified.

  • Copy the repository link from GitHub, paste it into the input box and Voila! You have made a web app which scrapes information about whatever is written in the ReadMe by the developer.

The link for the project on GitHub is here.
I really hope it was not very hard to follow.
Suggestions to improve are always welcome :)
💖 💪 🙅 🚩
syncrofosatron
Neeraj Mishra

Posted on March 2, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related