Neeraj Mishra
Posted on March 2, 2021
Hello there!
Welcome to my first post and my first ever tutorial.
I started learning web development last year, just HTML and CSS. Started learning again last month and I thought I'd learn best by writing about it as I learn and I am fairly new to this.
We will be making a website to extract the description from GitHub repositories, all we need is the link to the repository.
Note: I am using Hyper terminal for the execution of the commands and I am hoping that you have the latest version of Node.js installed as well as npm. You could use any other terminal, or even do things manually if you're such a masochist. XD
Alright now, let's get to work!
Make a directory or folder anywhere you like and name it anything you like. For more user-friendliness I'd be using the name:
GitDesc
.
To create the folder, use command:mkdir GitDesc
After you have done so, change your current directory to theGitDesc
directory by using command:cd GitDesc
.Now it's time to initialize
npm
and install the required packages for our project. So go to the terminal and run the following commands to initialize and install packages.
npm init
You will be asked some details to fill about the project you
are creating. If you don't want to fill anything, you can just
press enter until it stops asking.
Or there is another shortcut to automatically fill in the
details using commandnpm init -y
.npm install express --save
This is a framework for creating the web application, which is
just what we want. If you wanna learn more, here's the link: Express.npm install body-parser --save
We will make a HTML file for the user to put the link of
certain GitHub repository and this link would be transferred
to the server-side JavaScript using this package.
After that, we will make a connection request from that link
to GitHub and the required ReadMe content will be extracted.npm install cheerio --save
With its help, we will narrow down our required data into
text. This is an API used for Web Scraping and that's what
we are doing here too.npm install request --save
This is required in order to make HTTP request to the website.
You can omit the use of --save
Now we will create two files, HTML file for the user-interface and a JavaScript file which would be responsible for the server-side scripting.
touch
command is used to make files and the name we will use are, "index" and "app".
So do the following,
touch index.html
touch app.js
Open the project in code editor. I am using Atom and to open the project in atom via the command-line, we use command:
atom .
in the working directory. This will open up Atom with the project we are currently working on.Open
app.js
file.
First, we will have to import all the packages that we have installed so far.
const cheerio = require('cheerio');
const request = require('request');
const express = require('express');
const bodyParser = require('body-parser');
- Create a new app and set-up the body-parser.
const app = express();
app.use(bodyParser.urlencoded({extended: true}));
For more information on body-parser, you can check this
-
Now as everything has been set-up and initialized, we can finally get down to the meat. But before that, we should think about how the interface is going to be.
1. There will be a simple HTML page with an input box and a submit button.
2. After pressing the button, the results will be displayed in the page.
Now we will send the whole HTML file as the GET request.
app.get("/", function(req, res)
{
res.sendFile(__dirname + "/index.html");
})
- Below we are going to post the information we have received from GitHub to the webpage using POST.
app.post("/", function(req, res)
{
// This is the URL we have received from the HTML file.
const url = req.body.githubLink;
// This is HTTP request to the website.
// response is received from the website.
// html contains the website's html.
request(url, function(error, response, html)
{
// This command will load the HTML of the
// repository and store it in a constant.
const $ = cheerio.load(html);
// This is the class under which the GitHub's ReadMe
// information is put, we are going to extract that
// information and will store it in a constant.
const body = $('.markdown-body');
res.write("<h3>" + "Description" + "</h3>");
// To extract just the text from the body,
// we are using body.text().
res.write("<textArea cols=\"90\" rows = \"30\">" + body.text() + "</textArea>");
res.send();
})
})
- To start the server, we will do the following:
// You can use any port number you like.
// Most common is 3000.
app.listen(666, function()
{
console.log("Server started...");
})
- Now for the final step, open up
index.html
and add the following lines within the<body>
tag:
<form action="/" method="post">
<input type="link" name="gitHubLink" placeholder="Enter link here...">
<button type="submit" name="submit">Submit</button>
</form>
Run the web application we have just made after starting the server by typing
node app.js
in the terminal.
You'll see a log as, "Server started..." which means, the server is listening at port you have specified and there are no errors.Open your web browser and type in
localhost:666/
, don't forget to change the port number with the one you have specified.Copy the repository link from GitHub, paste it into the input box and Voila! You have made a web app which scrapes information about whatever is written in the ReadMe by the developer.
The link for the project on GitHub is here.
I really hope it was not very hard to follow.
Suggestions to improve are always welcome :)
Posted on March 2, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.