Scraping Yelp and Facebook with Node. Displaying data with ASP.NET Core

shanemlk

Shane Kunz

Posted on January 27, 2020

Scraping Yelp and Facebook with Node. Displaying data with ASP.NET Core

I spent today adding a feature to StunodsPizza.com's homepage to show all their positive customer reviews. We wanted to fix up their web branding today and add the reviews feature in expectation for the slow winter season.

You can find the scraping code here: https://github.com/shaneMLK/scrape-facebook-and-yelp-reviews.

Steps

  • Scrape all positive reviews from a Yelp and Facebook page with a Node.js script.
    • This includes grabbing the reviewer's name, avatar image, review text, and source of the review (Yelp or Facebook) and generating a .sql insert script.
  • Create a database schema and insert all the data into a Azure database.
  • On the front end, show all the reviews in a Swiper.js carousel on the homepage using Razor.
    • I want them to be randomly shuffled on page load.

Step 1: Scrape the data with Node.js.

I started with a git repo I had used recently hacker-DOM's github-by-stars project.

The result was this: scrape-facebook-and-yelp-reviews.

You have to download the pages you want to scrap using browser dev tools. The reason for this is to grab any data dynamically loaded on the client side. You then run the program against the HTML files (npx nodemon index.js), and out will come SQL insert statements you can put in a database. You can also upload the avatar images to something like Azure storage or S3 AWS buckets to grab the images on a production site.

For example, I visited the company's Facebook page, inspect the page with the inspector, right clicked on the root <html> tag and clicked "copy" -> "Outer HTML". I pasted that into a file named FacebookReviews_1-26-2020.html in a folder /html_scr. I made sure the file was referenced correctly in the /src/retreiveFacebookReviews.js file on line 7. The project uses a library called cherrio, that allows us to access the DOM of the html file as if we were using jQuery. Line 8 sets this up const $ = cheerio.load(res).

I ran npx nodemon index.js to generate .sql insert scripts I need to setup the database schema.

Step 2: Setup your reviews database schema with Entity Framework and a Azure database.

In my ASP.NET Core project within a /Models/ReviewContext.cs file, I put the following code:

using Microsoft.EntityFrameworkCore;
using Newtonsoft.Json;
using System.Collections.Generic;
using System.ComponentModel.DataAnnotations;

namespace MyProject.Models
{
    public class ReviewContext : DbContext
    {
        public ReviewContext (DbContextOptions<ReviewContext> options)
            : base(options)
        { }
        public DbSet<Review> Reviews { get; set; }

    }

    public class Review
    {
        public int Id { get; set; }
        public int UserId { get; set; }
        // UserId turned out unnecessary
        public string ReviewText { get; set; }
        public string UserName { get; set; }
        public string Source{ get; set; }
    }
}

Side note that within the Startup.cs in the ConfigureServices method, I have the following line...

services.AddDbContext<ReviewContext>(options => 
options.UseSqlServer(
Configuration.GetValue<string>("AppSettings:StorageConnectionString")));

... which allows me to keep my Azure database connection string in my appSettings.json as StorageConnectionString. This string will now be what entity framework uses to update the database schema.

I run dotnet ef migrations add "ReviewsMigration" to create a migration. A migration is just a list of un-run steps to update a database.

Then I run dotnet ef database update to actually update the database's schema. Note that if you have a appSettings.Development.json, the update will run on that file's StorageConnectionString, not appSettings.json's StorageConnectionString field.

Step 3: Display the reviews on the front end using Razor.

Within /Views/Shared/_Layout.cshtml I include the Swiper.js javascript and styles.

<link rel="stylesheet" href="https://unpkg.com/swiper/css/swiper.min.css">
<script src="https://unpkg.com/swiper/js/swiper.min.js"></script>

The _Layout.cshtml file is what wraps all my views. The method @RenderBody() is where my inner views will render.

I edited my Index function in the HomeController to pass all the reviews to the Views/Home/Index.cshtml view by using return View(_context.Reviews.ToList().Shuffle());. But in order to have access to the database context, we need to use dependency injection. At the top of the HomeController class we use the following code to tell ASP.NET to pass the database context.

        private readonly ReviewContext _context;

        public HomeController(ReviewContext context)
        {
            _context = context;
        }

The shuffle method is an static extension method to the IList type which is declared outside the HomeController class but within the same file. It simply randomizes the order of the reviews:

   public static class ShuffleExtension{
        public static IList<T> Shuffle<T>(this IList<T> list)  
        {  
            Random rng = new Random();
            int n = list.Count;  
            while (n > 1) {  
                n--;  
                int k = rng.Next(n + 1);  
                T value = list[k];  
                list[k] = list[n];  
                list[n] = value;  
            }
            return list;
        }
    }

At the top of the homepage view (/Views/Home/Index.cshtml) I write @model List<Review> to declare that the view is expecting a list of reviews. Our reviews carousel is going to be a separate partial view block, so we render it using @await Html.PartialAsync("_ReviewsBlock", Model) within the /Views/Home/Index.cshtml.

Within the /Views/_Shared/_ReviewsBlock.cshtml, I grab some AppSettings values and declare that the block is expecting a list of reviews as well.

@using Microsoft.Extensions.Configuration
@inject IConfiguration Configuration
@model List<Review>
@{
    var AzureBlobStorageAccountName = Configuration.GetSection("AppSettings")["AzureBlobStorageAccountName"];
    var AzureBlobStorageContainer_Users = Configuration.GetSection("AppSettings")["AzureBlobStorageContainer_Users"];
}

The appSettings.json values are just from Azure's blob storage service. I have a container just for user avatar images on the reviews. I've allowed the blob storage container to be accessed anonymously. I upload the images straight from the node project's /output folder to the Azure container. I can then access them all from the view like so... https://@(AzureBlobStorageAccountName).blob.core.windows.net/@(AzureBlobStorageContainer_Users)/@("user_review_img_" + review.UserName.Replace(" ", "_") + ".jpg")".

I used the Swiper.js get started guide to craft the carousel.

The main html structure is

<!-- Slider main container -->
<div class="swiper-container">
    <!-- Additional required wrapper -->
    <div class="swiper-wrapper">
        <!-- Slides -->
        <div class="swiper-slide">Slide 1</div>
        <div class="swiper-slide">Slide 2</div>
        <div class="swiper-slide">Slide 3</div>
        ...
    </div>

    <!-- If we need navigation buttons -->
    <div class="swiper-button-prev"></div>
    <div class="swiper-button-next"></div>
</div>

... and Swiper handles a lot of the styling for us with these classes.

I loop through the reviews and render carousel slides:

@foreach (var review in Model)
{
    <div class="swiper-slide">
    . . . 
    </div>
}

Within the carousel I can display the review data using @review.UserName, @review.Source, and @review.ReviewText.

Lastly, there's a <script> tag to initialize the carousel after the page is done loading...

<script>
    $(document).ready(function(){
        var mySwiper = new Swiper ('.image-slide .swiper-container', {
            direction: 'horizontal',
            loop: true,
            slidesPerView: 1,
            autoplay: {
                delay: 3000,
            },
        });
    });
</script>

I specify .image-slide .swiper-container as the selector to make sure it doesn't conflict with other .swiper-containers on the page.

After some styling with some heavy use of CSS Flexbox, I think the result turned out simple and effective.

Carousel Reviews Swiping

💖 💪 🙅 🚩
shanemlk
Shane Kunz

Posted on January 27, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related