Save Money and Frustration on Amazon using Azure Functions

fifiteklemedhin

FifiTeklemedhin

Posted on December 29, 2020

Save Money and Frustration on Amazon using Azure Functions

Amazon is a service known for convenience and efficiency. However, it is often frustrating to buy a product on Amazon only to find out that another site was selling it for less, or that the item was sold at a discount shortly after. This dilemma is contradictory to the benefits of online shopping mentioned above, with little efficiency and convenience in having to manually search for the best prices.

Recently, I have been learning about Microsoft Azure, and found the PERFECT way to save myself lots money and frustration! Through serverless computing, automating the task of observing a product’s price becomes a fun and affordable challenge. The result I came up with was PriceScraper, a web app that takes in an Amazon link and some basic information and sends a message if an item fluctuates in price. In this tutorial, I will walk you through how to create your own PriceScraper app to gain a better understanding of Azure as well as price fluctuations on Amazon.

Note: Before you start, please remember to react to and comment on this article! I would love any feedback to improve your experience.

Overview

Flowchart:

Alt Text

The program interface is a basic form with fields for a link to a product, a phone number, the number of days the user wants to receive notifications, and a baseline discount percentage for the item. The website triggers an Azure Function HTTP trigger, which inputs the data into a SQL database. Then, a timer trigger checks the price of the item every day, updates the database, and sends a text message to the user under three conditions. The user is notified if an item’s price increases from the time the user submits the form, the price decreases to the desired discount value, or the notification period is reached.

Heads Up! By the end of this tutorial, you will have:

  • Installed Visual Studio Code
  • Created a Microsoft Azure account and subscription (free trial included)
  • Created a Twilio account (free trial included)
  • Created a GitHub account

Step 1: Setting up the Basics:

Create an Azure Account:

In order to do anything, we need to have access to Azure! This means creating an account and subscription. As mentioned above, Azure comes with a free $200 credit, which is PLENTY for our purposes. If you already have an account, then create a new subscription or resource group in order to store everything related to your project.

Install VSCode

We are using Python as the language for our functions, and sadly, the Azure Portal is very limited in terms of use with Python than it is with other runtime stacks. Because of this issue, we are using Visual Studio Code as an alternative. VSCode has GREAT integration with Azure, as both the editor and the platform were created by Microsoft. We will be using the Azure Portal for administrative configurations (mainly with creating resources and managing the remote app), but VSCode is where most of the work will be done.

So, with that information in mind, it is time to download VSCode! This is a relatively straightforward process; all you have to do is search ‘download VSCode’ or follow this link. After installing the editor, I would highly reccomend signing into Github through the editor. While this tutorial does not involve version controlling anything, it is the wisest decision to keep track of your code so you can:

  1. Go back to your last working version
  2. Keep your code in case your subscription is disabled

Create a Database

Since PriceScraper is a web app, it is extremely important to find a way to keep track of data across all users. This is another instance where using Azure is a benefit: we get to host a database with minimum overhead cost and easy set up.

The first step is to create the actual database in the Azure portal. However, it is important to note that there are many options on Azure for databases. Make sure to search ‘SQL databases’ in the top search bar.

Finding the SQL database in the Azure portal

After selecting ‘SQL databases’, click + on the top left corner of the screen:

A SQL database instance after deployment

Lastly, fill out the basic information for your database. If you have not already, create a subscription, then name your database and select a server closest to your location. Most importantly, make sure to properly configure your database (link in compute+storage section in form) so that it requires the LEAST amount of money you can spend. Since this project will most likely be for personal use, you can minimize the database specifications to cut costs to a couple dollars:

The configuration form for a SQL database instance

After a couple minutes, the database should be created, and you will be led to a page similar to this

The database home page

The database is nearly setup, save one important step; configuring SQL authentication. If you go to query editor from the left menu, you will be prompted to create a username and password to access and modify the database.

Make sure to remember your username and password for the database!! You will be using it in future steps.

Connect the Database to VSCode:

In order to regularly access the database, it is easiest to use an extension in VSCode and have it right alongside the rest of our app. On the left hand bar in VSCode, go to the icon resembling several blocks and search up ‘Azure Databases’. From there, click install and wait for the extension to appear as an icon on the bar.

Downloading a database extension in VS Code

Next, we need to actually link our Azure account to VSCode. Press Ctrl + shift + p to open the command bar, then type ‘SQL’ and choose ‘MS SQL: Connect’. You might be prompted with a message telling you to sign into your Microsoft credentials, with a link to the external site. In that case, log in and authorize all permissions.

Connecting to MSSQL with the VSCode command bar

Next, go to the extension on the left bar and hover over ‘connections’. Click on the ‘+’ symbol that appears and a menu similar to the command bar will pop up.

Inputting the database URL in the VSCode command bar

When prompted for the database link, go to the database in azure portal, and server name is in overview on the far right.

Locating a database URL

After inputting the server name, press enter to continue or put in db name if you want. Then, select ‘SQL Login’ and use your SQL login username and password you had created for authentication. Follow the next couple of prompts related to the SQL login, then you should be connected, with the name of your database and a green dot showing up when the extension is open:

A database connected in VSCode

Step 2: Creating Columns for the Database

https://media.giphy.com/media/xT5LMHxhOfscxPfIfm/giphy.gif

It finally time to start using our database! Click on your database, then navigate to the ‘Tables’ directory. Notice one dataset that says DBO: this is the set we are branching off of to create our table. To create the table, we are going to create SQL queries specifying the columns and their datatypes. Right click on the database name and select ‘New Query’, and a new document should pop up. In this document, copy and paste the following query:

CREATE TABLE dbo.ScrapedData (
    phonenumber FLOAT(100),
    baseline_percentage FLOAT(100),
    duration FLOAT(100),
    original_price FLOAT(100),
    current_price FLOAT(100),
    link VARCHAR(400)
)
Enter fullscreen mode Exit fullscreen mode

The datatype capacities are somewhat arbitrary save for the link and phone number columns. Since the phone number will have at least 10 digits, it is better to stick to a high capacity. Same goes for the link; many URLS are extremely long and would actually max out 255 characters.

Step 3: Creating Functions

Now it is time to set up our functions. We will be creating an HTTP trigger, which has a url as an endpoint, as well as a timer trigger, which independently goes off after a specified interval of time. As previously mentioned Python is not as easily integrated with the Azure portal as other runtime stacks are, so we will be using VSCode extensions in order to program our app.

Go through this quickstart guide, which should take around thirty minutes to complete in order to learn how to set up a Python function in VSCode. As with version control, we will not explicitly be going over deploying your app from your local machine to the remote version, but that information can be found in the guide. After you have followed the instructions, you should have created an HTTP triggered function. Make sure to make a folder for all of your code for this app.

A tutorial on how to create and deploy a function

Creating a Timer Trigger

To create a timer trigger, follow the same steps outlined in the guide, but change the type from HTTP to timer in the menu bar. If you are prompted for an N-CRONTAB time interval for when the function is triggered, use this link to generate your desired interval.

Setting Up Twilio

Twilio is an awesome service that we will be using in order to send SMS messages to users. Since we will need a Twilio account to access its API in our functions, we will be setting it up right now. Go to the Twilio website and create a new account. If you already have an account, feel free to use it so long as you have billing set up. Otherwise, create a new account in order to get a free trial.

Important Note: not having billing set up with your account will limit you to sending messages only to manually authorized numbers. This will restrict your app to people you personally know.

Step 4: Inputting Data into the Database

It is finally time to start coding! We will begin by programming our HTTP trigger, which will process the user’s request through the webform, and create a row in the database using the information provided.

The flowchart for an HTTP trigger

Go to the topmost icon in VSCode’s left panel, click add folder to workspace, and navigate to where you saved your functions. Then in your function, go the nested folder with the same name as the main directory, and open init.py.

A GIF on locating the init.py file

When you open up your init.py file, you will have a boilerplate for creating a basic HTTP trigger. This is where all of our code for the trigger will live.

Connecting to the Database:

Now that we have our function ready to go, we need to set up a connection with our database. The first step is to install a VERY important python package in pyodbc. Go to this link for the pip command and install it to your trigger’s directory in the terminal. After installation is complete, make sure to import pyodbc at the top of init.py.

Pyodbc is perhaps the most complex dependency in this entire project, but the gist is that it allows you to connect to a database and write SQL queries within Python. Below is the code initializing a connection instance that we will be using in our function:

import pyodbc

server = '<db_link>'
database = '<db_name>'
username = '<your_user>'
password = '<your_password>'
cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
Enter fullscreen mode Exit fullscreen mode

The variable dbCnxn is the actual connection to the datababase, and ‘cursor’ is an object that lets you perform actions through that connection to actually query the database. Before running, make sure that your local environment is to the database. Make sure that all of these Check the database extension and make sure that the database has a green dot instead of a red on)

Important note: since we are programming this locally, the database is being accessed through varying IP adresses since varying ports are used every time we run a function. Left unchecked, this is a huge problem since our database requires manual authorization of IP adresses to get through its firewall. Therefore we need to configure our firewall settings so that ALL IP adresses are allowed to query the database, since we never know what IP address our local server will access the database endpoint through. Go to database in the Azure protal and select Overview in the left tab. On the righthand menubar, select set server firewall:

Locating the database firewall settings in the Azure portal

You will be redirected to a page called ‘firewall’ settings’, where you can give a range of authorized ip addresses. Go to ‘Rule Name’ and enter in a description like “allows all”, then put in 0.0.0.0 as the starting IP and 255.255.255.255 of ip addresses, which allows for every combination possible in IPv6.

Alt Text

Now it is time to test the code. Your code should look like this so far:

import logging

import azure.functions as func
import pyodbc

server = '<db_link>'
database = '<db_name>'
username = '<your_user>'
password = '<your_password>'
cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()

def main(req: func.HttpRequest) -> func.HttpResponse:
    logging.info('Python HTTP trigger function processed a request.')

    name = req.params.get('name')
    if not name:
        try:
            req_body = req.get_json()
        except ValueError:
            pass
        else:
            name = req_body.get('name')

    if name:
        return func.HttpResponse(f"Hello, {name}. This HTTP triggered function executed successfully.")
    else:
        return func.HttpResponse(
             "This HTTP triggered function executed successfully. Pass a name in the query string or in the request body for a personalized response.",
             status_code=200
        )
Enter fullscreen mode Exit fullscreen mode

Refer back to this link to run your function. In your terminal, you will most likely see connection errors, but these issues vary too widely to cover in this tutorial. Pyodbc can be VERY frustrating, so my advice is to make Google and StackOverflow your best friends! Search up any error that comes up and follow verified instructions until something sticks. If you experience any issues with PyLint, refer to this link to delete and create a new virtual environment for your function. You are good to go once your terminal looks similar to the following:

The output shown in VS Code Terminal panel when debugging locally

Specifying Request Parameters:

Finally! Our technologies are set up and we can query into our database using Python. Here is a quick reminder on what this trigger should do:

  1. Take request parameters sent by the user through the webform
  2. Using the URL provided, webscrape the price off of the site
  3. Update the database with a new row with all of the request parameters along with the original price and current price for the item (which are equivalent)

In order to accept various request parameters, we need to edit the function.json file to include all of the the form information. It should be right under the init.py file in your directory. Edit and save your file so that it matches the following JSON:

{
    "scriptFile": "__init__.py",
    "bindings": [
      {
        "authLevel": "anonymous",
        "type": "httpTrigger",
        "direction": "in",
        "name": "req",
        "phonenumber": "req",
        "url": "req",
        "baseline_percentage": "req",
        "duration": "req",
        "methods": [
          "get",
          "post"
        ]
      },
      {
        "type": "http",
        "direction": "out",
        "name": "$return",
        "url": "$return",
        "phonenumber": "req"
      }
    ]
  }
Enter fullscreen mode Exit fullscreen mode

Webscraping Prices with a URL:

Now that we can access all of the information the user inputs, we need to get the information about the price as well. The user only inputs a URL, so it is our job to somehow retrieve the price with it.

We will be using another great python library called Beautiful Soup, which allows us to webscrape, or parse the HTML off of a site. Webscraping is so central to this project that PriceScraper was named after it! Because webscraping is so important, choosing the simplest and most efficient way to scrape the price is a priority. Beautiful Soup is extremely beginner friendly, and is the largest reason we are using Python functions.

In order to install Beautiful Soup, call pip install bs4 in your terminal or search up ‘beautiful soup download’. After installing, make sure to import the package:

from bs4 import BeautifulSoup
Enter fullscreen mode Exit fullscreen mode

You will also need to download and import the requests package:

import requests
Enter fullscreen mode Exit fullscreen mode

Once everything is imported, copy and paste the following function into init.py:

def scrape_price(URL: str):
    headers = ({'User-Agent':
            'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36',
            'Accept-Language': 'en-US, en;q=0.5'})

    page = requests.get(URL, headers=headers) #response.text gets page and html
    soup = BeautifulSoup(page.text, 'html.parser')

    price_tag = soup.find(id='priceblock_ourprice')
    if price_tag is None:
        price_tag = soup.find(id='priceblock_dealprice')
    if price_tag is None:
        price_tag = soup.find('data-asin-price')
    if price_tag is None:
        price_tag = soup.find(id='price_inside_buybox')
    if price_tag is None:
        price_tag = soup.find(class_='p13n-sc-price')
    if price_tag is None:
        return "incompatible"
    price = price_tag.get_text()
    #gets the lowest price if a range is listed
    if('-' in price):
        price = price[:price.index('-')]

    #turn into number
    price = price.replace('$', '')
    price = float(price)
    return price
Enter fullscreen mode Exit fullscreen mode

Notice the page variable, which gets the actual page using the URL parameter. Soup is a Beautiful Soup object that takes the page, and parses out its HTML. The method soup.find() selects an individual element through an attribute (in this case the element’s id).

One thing to note is that the function will not work for some sites. Amazon is a huge platform with different layouts for deals and discounts and so on, making it difficult to know which attribute to select and ensuring that your program will not work in every case. To make sure that the program does not crash, we check multiple elements for the price, then return “incompatible” if it is not found. Otherwise, the value is cleaned up into only numeric characters, casted into a float value, and returned.

Updating the Database:

We now have all of the information needed to create a row in the database for the user’s request. Along with the request parameters, we will input the webscraped price two times, once as the original price (the price of the product at the time the request is processed), and once as the current price (which will be updated every time the program checks for price fluctuations).

We are going to use the pyodbc variables that we created in previous steps to query into the database and insert all of the values:

def update_database(url:str, phonenumber:int, baseline_percentage:float, duration:float, original_price:float, current_price:float, item_name: str):
    row_str = ""
    cursor.execute("INSERT INTO dbo.ScrapedData VALUES (?,?,?,?,?,?,?)", int(phonenumber),float(baseline_percentage), float(duration), float(scrape_price(url)), float(scrape_price(url)), url, item_name) 
Enter fullscreen mode Exit fullscreen mode

This is actually all the code you need to insert a new row! However, we want to make sure that a user can only track the same URL once. To do this, we are going to create another query to delete a row if the phonenumber and link exist elsewhere. Afterwards, make sure to call ‘cnxn.commit()’ in order to actually modify the database’s contents:

cnxn.commit()
    cursor.execute('''WITH removing AS (
    SELECT 
        phonenumber, 
        baseline_percentage, 
        duration, 
        original_price,
        current_price, 
        link,
        ROW_NUMBER() OVER (
            PARTITION BY 
                phonenumber,  
                link
            ORDER BY 
                phonenumber, 
                link
        ) row_num
        FROM 
            dbo.ScrapedData
        )
        DELETE FROM removing
        WHERE row_num > 1;''')

    cnxn.commit()
Enter fullscreen mode Exit fullscreen mode

When testing our function, it would be convenient to see the rows in the table without constantly running a query through hitting a button in VSCode or the Azure portal. In order to do so, add one more query to the function that selects all of the rows. Then, store each row in a string and return the value:

 cnxn.commit()
    cursor.execute("SELECT * FROM [PriceScraper].[dbo].[ScrapedData]")
    row = cursor.fetchone()
    while row:
        row_str += row.__repr__() + "\n"
        row = cursor.fetchone()
    return row_str
Enter fullscreen mode Exit fullscreen mode

Your complete function should look like this:

def update_database(url:str, phonenumber:int, baseline_percentage:float, duration:float, original_price:float, current_price:float, item_name: str):
    row_str = ""
    cursor.execute("INSERT INTO dbo.ScrapedData VALUES (?,?,?,?,?,?,?)", int(phonenumber),float(baseline_percentage), float(duration), float(scrape_price(url)), float(scrape_price(url)), url, item_name) 

    cnxn.commit()
    cursor.execute('''WITH removing AS (
    SELECT 
        phonenumber, 
        baseline_percentage, 
        duration, 
        original_price,
        current_price, 
        link,
        ROW_NUMBER() OVER (
            PARTITION BY 
                phonenumber,  
                link
            ORDER BY 
                phonenumber, 
                link
        ) row_num
        FROM 
            dbo.ScrapedData
        )
        DELETE FROM removing
        WHERE row_num > 1;''')

    cnxn.commit()
    cursor.execute("SELECT * FROM [PriceScraper].[dbo].[ScrapedData]")
    row = cursor.fetchone()
    while row:
        row_str += row.__repr__() + "\n"
        row = cursor.fetchone()
    return row_str
Enter fullscreen mode Exit fullscreen mode

Calling our Functions and Handling Requests:

Now that our functions are written, we need to actually use them! All of the functions require the parameters, so the first thing we need to do is set up a way to access the request parameters. In the boilerplate for an HTTP trigger, the main function is responsible for getting this information:

import logging

import azure.functions as func


def main(req: func.HttpRequest) -> func.HttpResponse:
    logging.info('Python HTTP trigger function processed a request.')

    name = req.params.get('name')
    if not name:
        try:
            req_body = req.get_json()
        except ValueError:
            pass
        else:
            name = req_body.get('name')

    if name:
        return func.HttpResponse(f"Hello, {name}. This HTTP triggered function executed successfully.")
    else:
        return func.HttpResponse(
             "This HTTP triggered function executed successfully. Pass a name in the query string or in the request body for a personalized response.",
             status_code=200
        )
Enter fullscreen mode Exit fullscreen mode

Notice the variable name, which is assigned as a property of req, or the request for the function. Use name as reference to getting our parameters:

def main(req: func.HttpRequest) -> func.HttpResponse:
    logging.info('Python HTTP trigger function processed a request.')

    phonenumber = req.params.get('phonenumber')
    url = req.params.get('url')
    baseline_percentage = req.params.get('baseline_percentage')
    duration = req.params.get('duration')
    name = req.params.get('name')
Enter fullscreen mode Exit fullscreen mode

The next thing we need to do is check whether each parameter is sent in, most especially the URL. If all of the parameters are sent in, we can return a response to the request, which calls update_database(). In this function call, update_database() is given the form parameters as well as the original price and the (equivalent) current price. Returning the function call as a response will output the return value of update_database(), which we set to be the rows contained in the database.

all_vals = {"phone": phonenumber, "url": url, "percent": baseline_percentage, "dur": duration, "name": name}
    null_valls = ""

    if not phonenumber:
        null_valls += "phonenumber, "
    if not url:
        null_valls += "url, "
    if not baseline_percentage:
        null_valls += "baseline_percentage, "
    if not duration:
        null_valls += "duration, "
    if not name:
        null_valls += "name" 

    if url and scrape_price(url) == "incompatible":
        return func.HttpResponse(
             f'url unsupported',
             status_code=200
        )
    if url and phonenumber and baseline_percentage and duration and name:
        return func.HttpResponse(update_database(url, phonenumber, baseline_percentage, duration, scrape_price(url), scrape_price(url), name))

    else:
        return func.HttpResponse(
             null_valls,
             status_code=200
        )
Enter fullscreen mode Exit fullscreen mode

And with that, our HTTP trigger is complete! To test the trigger, run it in terminal and copy and paste the URL provided. I highly recommend using the Postman API Client with this URL as an endpoint. From there, you can input test parameters and see whether the output is correct. Another alternative, of course, is do test in a web browser, but inputting parameters in a search bar is extremely frustrating and inefficient.

Step 5: Checking for Fluctuations and Sending Notifications

Step 4 was all about giving our PriceScraper some data to go off of. Now, we get to work on retrieving and updating that data! Through programming a timer trigger, we will:

  1. Check whether we need to notify any users
  2. Send the notifications
  3. Update the database with the current prices for each item

Here is a flowchart outlining the logic for our code:

An HTTP Trigger Flowchart

Setting Up Dependencies:

You are going to need every dependency used in the previous trigger, along with some new packages. Navigate to init.py in your timer trigger function, then copy and paste every import. Refer back to the instructions and make sure you have downloaded and imported:

  • requests
  • pyodbc
  • beautiful soup

After importing pyodbc, make sure to copy the following code, with each variable being global:

server = '<db_link>'
database = '<db_name>'
username = '<your_user>'
password = '<your_password>'
cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
Enter fullscreen mode Exit fullscreen mode

You will additionally need to download Twilio package and import the following to send SMS messages:

import os 
from twilio.rest import Client
Enter fullscreen mode Exit fullscreen mode

Getting Database Information:

We will be using a pyodbc cursor object again to query into the database and create parallel lists for each column. These lists which can then be used to check who needs to be notified and draft a message with the relevant information.

def get_Database_Information():
    phonenumbers = []
    names = []
    links = []
    original_prices = []
    current_prices = []
    percentages = []
    durations = []

    cursor.execute('SELECT phonenumber,link,original_price FROM dbo.ScrapedData;')
    row = cursor.fetchone()
    row_str = ""
    while row:
        row_str += row.__repr__() + "\n"
        row = cursor.fetchone()

    # get all of each type of data
    cursor.execute('SELECT phonenumber FROM dbo.ScrapedData;')
    row = cursor.fetchone() 
    while row:
        row_str = row.__repr__()
        row_str = row_str.replace('(','')
        row_str = row_str.replace(')','')
        row_str = row_str.replace(',','')
        row_str = float(row_str)
        phonenumbers.append(row_str)
        row = cursor.fetchone()

    cursor.execute('SELECT item_name FROM dbo.ScrapedData;')
    row = cursor.fetchone()
    while row:
        row_str = row.__repr__()
        row_str = row_str.replace('(','')
        row_str = row_str.replace(')','')
        row_str = row_str.replace(',','')
        names.append(row)
        row = cursor.fetchone()

    cursor.execute('SELECT link FROM dbo.ScrapedData;')
    row = cursor.fetchone()
    while row:
        row_str = row.__repr__()
        row_str = row_str.replace('(','')
        row_str = row_str.replace(')','')
        row_str = row_str.replace(',','')
        links.append(row)
        row = cursor.fetchone()

    cursor.execute('SELECT original_price FROM dbo.ScrapedData;')
    row = cursor.fetchone()
    while row:
        row_str = row.__repr__()
        row_str = row_str.replace('(','')
        row_str = row_str.replace(')','')
        row_str = row_str.replace(',','')
        row_str = row_str.replace(' ', '')
        row_str = float(row_str)
        original_prices.append(row_str)
        row = cursor.fetchone()

    cursor.execute('SELECT current_price FROM dbo.ScrapedData;')
    row = cursor.fetchone()
    while row:
        row_str = row.__repr__()
        row_str = row_str.replace('(','')
        row_str = row_str.replace(')','')
        row_str = row_str.replace(',','')
        row_str = row_str.replace(' ', '')
        row_str = float(row_str)
        current_prices.append(row_str)
        row = cursor.fetchone()

    cursor.execute('SELECT baseline_percentage FROM dbo.ScrapedData;')
    row = cursor.fetchone()
    while row:
        row_str = row.__repr__()
        row_str = row_str.replace('(','')
        row_str = row_str.replace(')','')
        row_str = row_str.replace(',','')
        row_str = row_str.replace(' ', '')
        row_str = float(row_str)
        percentages.append(row_str)
        row = cursor.fetchone()

    cursor.execute('SELECT duration FROM dbo.ScrapedData;')
    row = cursor.fetchone()
    while row:
        row_str = row.__repr__()
        row_str = row_str.replace('(','')
        row_str = row_str.replace(')','')
        row_str = row_str.replace(',','')
        row_str = row_str.replace(' ', '')
        row_str = float(row_str)
        durations.append(row_str)
        row = cursor.fetchone()

    return phonenumbers, names, links, original_prices, current_prices, percentages, durations

Enter fullscreen mode Exit fullscreen mode

At the end, the function returns all of the lists.

Finding Who Needs to Be Notified:

There are three cases in which a user will be notified by PriceScraper:

  1. The price has increased
  2. The price has meets the user’s ideal price
  3. The duration the user wanted notifications for has been reached

We will be writing methods for each of these cases. The parameters for each method is different, but all of them require lists outputted by our get_database_information() method:

#if price increases from current
def price_increased_from_current(current_prices, links, phonenumbers, names):
    increased_notifs = []

    for i in range(len(current_prices)):
        updated_current_price = scrape_price(links[i], current_prices[i])
        if updated_current_price > current_prices[i]:
            increased_notifs.append([phonenumbers[i], names[i], current_prices[i], updated_current_price, links[i]])
    return increased_notifs
    #use to send to phone number 'price for <item> has increased from <old current> to <new current>

#if price is within baseline percentage
def price_increased_from_baseline(original_prices, links, phonenumbers, percentages, current_prices, names):
    baseline_notifs = []

    for i in range(len(original_prices)):
        original_price = original_prices[i]
        baseline_price = original_price - (original_price * percentages[i] * .01)
        updated_current_price = scrape_price(links[i], current_prices[i])
        current_price = current_prices[i]
        if current_price <= baseline_price:
            baseline_notifs.append([phonenumbers[i], names[i], updated_current_price, links[i]])
    return baseline_notifs

#if duration reached
def duration_reached(phonenumbers, durations, names, links, current_prices):
    duration_notifs = []

    today = datetime.date.today()

    for i in range(len(durations)):
        updated_current_price = scrape_price(links[i], current_prices[i])
        if(durations[i] == 0):
            duration_notifs.append([phonenumbers[i], names[i], updated_current_price, links[i]])

    return duration_notifs
Enter fullscreen mode Exit fullscreen mode

Each function outputs a contact list with all the information needed to compose a notification message for each user.

Note: the timer trigger also uses the scrape_price() function, which you can copy and paste from below:

def scrape_price(URL: str, current_price):
    headers = ({'User-Agent':
            'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36',
            'Accept-Language': 'en-US, en;q=0.5'})

    '''
    URL = str(URL)
    if 'dp/' in URL:
        url_short = URL[URL.index('dp/'):]
        if '?' in url_short:
            url_short = url_short[: URL.index('?')]
    '''


    page = requests.get(URL[0], headers=headers) #response.text gets page and html
    soup = BeautifulSoup(page.text, 'html.parser')

    price_tag = soup.find(id='priceblock_ourprice')
    if price_tag is None:
        price_tag = soup.find(id='priceblock_dealprice')
    if price_tag is None:
        price_tag = soup.find('data-asin-price')
    if price_tag is None:
        price_tag = soup.find(id='price_inside_buybox')
    if price_tag is None:
        price_tag = soup.find(class_='p13n-sc-price')
    if price_tag is None:
        return current_price
    price = price_tag.get_text()
    #gets the lowest price if a range is listed
    if('-' in price):
        price = price[:price.index('-')]

    #turn into number
    price = price.replace('$', '')
    price = float(price)
    return price
Enter fullscreen mode Exit fullscreen mode

Sending Messages:

Now it is time to create our reminder system! The actual function to send a message is very simple:

def send_message(phonenumber, message): 
    # Your Account Sid and Auth Token from twilio.com/console
    # and set the environment variables. See http://twil.io/secure
    account_sid = '<your account sid>'
    auth_token = '<your auth token>'
    client = Client(account_sid, auth_token)
    to_str = '+1{}'.format(phonenumber)
    message = client.messages \
                    .create(
                        body=message,
                        from_='<your account phone number>',
                        to= to_str[0:len(to_str)-1]
                    )

    print(message.sid)
Enter fullscreen mode Exit fullscreen mode

Make sure to input your account sid, auth token, and phone number into the function. This information can be found through the Twilio console, but make sure to never copy and paste them into the code. If you happen to push that code with github, anyone can access your information. Beyond that, Twilio screens to check if your information was posted to the internet, and automatically disables those credentials. Instead, set these credentials as environment variables.

After the helper function works, we will move on to creating a bigger method that iterates through each of the contact lists made from price_increased_from_current(), price_increased_from_baseline(), and duration_reached(). Each contact list contains sub lists with the phonenumber, product name, original price, current price, and URL, and is traversed in an individual loop. One by one, messages are sent out to each user on the list for each notification type:

def send_messages(increased_from_current, increased_from_baseline, reached_durations):

    for i in increased_from_current:
        phonenumber = i[0]
        name = i[1]
        old_price = i[2]
        current_price = i[3]
        link = i[4]
        send_message(phonenumber, "Product {} has increased in price from ${} to ${}\nLink to product: {}".format(name[0], old_price, current_price, link[0]))

    for i in increased_from_baseline:
        phonenumber = i[0]
        name = i[1]
        current_price = i[2]
        link = i[3]
        send_message(phonenumber, "Product {} meets your ideal discount percentage at ${}!\nLink to product: {}".format(name[0], current_price, link[0]))

    for i in reached_durations:
        phonenumber = i[0]
        name = i[1]
        current_price = i[2]
        link = i[3]
        send_message(phonenumber, "The set duration for {} is complete! The current price for this product is ${}\nLink to product: {}".format(name[0], current_price, link[0]))
Enter fullscreen mode Exit fullscreen mode

Updating the Current Price:

In order to keep track of fluctuations, the PriceScraper needs to be able to update the current price to the most recent time it was checked. We do this at the very end of the program so that we are not comparing the current price to itself when checking for notifications.

   def get_current_prices(phonenumbers, prices, urls):

        for i in range len(urls):
            if(scrape_price(urls[i]) != prices[i]):
                cursor.execute('UPDATE dbo.ScrapedData SET current_price = {} WHERE phonenumber LIKE \'%{}%\' AND link LIKE \'%{}%\';'.format(scrape_price(url[i]), phonenumbers[i], urls[i]))
                cursor.commit()
Enter fullscreen mode Exit fullscreen mode

All this function requires is the phone numbers, prices, and urls in the database. The url is used to scrape the current price off of the site, then is compared to the price listed in the database for the product. In the query, a row is modified if the scraped price is different from the. listed price. This is where the phonenumbers comes in: we specified in the HTTP trigger that a phonenumber can only correspond to a given URL once. Therefore, in the query, we make sure to change only the row with both that phoennumber and link so as not to change the current price for all users.

Tying Everything Together:

Now, we will call all of the methods we just created in the main function. To call every other method, we first need to create a list for every column in the database using get_database_information():

def main(mytimer: func.TimerRequest) -> None:
    phonenumbers = get_Database_Information()[0]
    names = get_Database_Information()[1]
    links = get_Database_Information()[2]
    original_prices = get_Database_Information()[3]
    current_prices = get_Database_Information()[4]
    percentages = get_Database_Information()[5]
    durations = get_Database_Information()[6]
Enter fullscreen mode Exit fullscreen mode

Using these lists, we will make our contact lists by calling price_increased_from_current(), price_increased_from_baseline(), and duration_reached():

increased_from_current = price_increased_from_current(current_prices, links, phonenumbers, names)
    increased_from_baseline = price_increased_from_baseline(original_prices, links, phonenumbers, percentages, current_prices, names)
    reached_durations = duration_reached(phonenumbers, durations, names, links, current_prices)
Enter fullscreen mode Exit fullscreen mode

Lastly, we will send the notifcations and update the database:

    send_messages(increased_from_current, increased_from_baseline, reached_durations)
    get_current_prices(phonenumbers, prices, urls)

    utc_timestamp = datetime.datetime.utcnow().replace(
        tzinfo=datetime.timezone.utc).isoformat()

    if mytimer.past_due:
        logging.info('The timer is past due!')

    logging.info('DATA: %s', get_Database_Information())
Enter fullscreen mode Exit fullscreen mode

Note: make sure to keep the boiler plate code!! This is what makes sure your function is triggered at the right intervals.

This is what messages sent through the function will look like:

A mock text message sent from the timer trigger

Step 6: Developing the Frontend

We are so close to finishing the PriceScraper! All that is left is creating an interface for users to actually access the back end through. We will develop a very simple form using HTML, CSS, and JavaScript. However, since we want our PriceScraper to be publicly accessible, we will create an Azure Static Webapp to deploy it to the Internet.

Setting Up a GitHub Account:

The first step to creating a static webapp is to have a GitHub Account. GitHub is a host for coding repositories, meaning that it is used for keeping track of different versions of a project and across different collaborators and tasks. Azure Static Webapps requires you to deploy your code to GitHub, then deploy it again to the webapp instance located in the Azure Portal. This link is provides great walk through on how to set up GitHub and an Azure Static Webapp, then connect both to VSCode.

After learning how to create and deploy a webapp, open the index.html page in your local webapp and paste the following code:

<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <script defer src="script.js" type="text/javascript"></script>
    <link rel="stylesheet" href="styles.css">
    <title>Amazon Webscraper</title>
</head>
<body>
    <h1>Price Scraper</h1>
    <form onsubmit="jsonifyData(event)" action="database.php" method="POST">
        <h4>Phone number: </h4> 
        <input id="phonenumber" name="phonenumber" type="tel" placeholder="only numbers" required class="field" autocomplete="off">

        <h4>Product Name </h4>
        <input id="product_name" name="product_name" type="text" placeholder="input the product price" required class="field" autocomplete="off"> 

        <h4>Product URL: </h4>
        <input id="url" name="url" type="url" placeholder="input the product url" required class="field" autocomplete="off">

        <h4>Baseline discount percentage: </h4>
        <input id="percentage" name="percentage" type="number" placeholder="-" required class="field" autocomplete="off">    

        <h4>Notification time frame (days): </h4>
        <input id="duration" name="duration" type="number" placeholder="-" required class="field" autocomplete="off"> 

        <br>
        <br>
        <button type="submit" onsubmit="jsonifyData(event)" class="submit">Submit</button>

        <br>
        <span class="form-status" id="popup"></span>



    </form>


</body>
</html>
Enter fullscreen mode Exit fullscreen mode

This has all of the code for the front end, but is somewhat primitive looking. You can either style this code by yourself or refer to my styling:

The styled webform

And there you have it! We have a beautiful front end for the PriceScraper.

Handling Form Data:

The gif above is a demo of the finished product, with the data actually being sent to our HTTP trigger. We need to validate the user input so that they put in working links and the right type of data. This is easiest to do in JavaScript, which is optimized for compatibility with HTML. In the directory for your style sheets and HTML, create a file called script.js. Then copy and paste the following code:

function validatePhonenumber(phonenumber)
{
    var numberType = typeof phonenumber;
    var regExp = /[a-zA-Z]/g; //any letters

    if( numberType != "number")
    {
        var noHyphens = phonenumber.replaceAll('-', '');
        console.log(noHyphens + ", length: " + noHyphens.length);
        if((noHyphens).length != 10 || (regExp.test(phonenumber)))
            return null;

        return noHyphens
    }


    return phonenumber;


}

function cleanDuration(duration)
{
    console.log("dur peeps");
    const currentDate = new Date();
    var stopDate = new Date();

    stopDate.setDate(currentDate.getDay() + duration);

    return duration;
}


function cleanPercentage(percentage)
{
    if(percentage > 100)
        percentage = percentage % 100;

    return percentage;
}


function validateURL(url)
{
    var request = new XMLHttpRequest();  
    request.open('GET', url, true);
    request.onreadystatechange = function()
    {
        if (request.readyState === 4)
        {
            if (request.status === 404) 
            {  
                return null;
            }  
        }
    };

    return url;
}
Enter fullscreen mode Exit fullscreen mode

The code above validates and cleans the user input to work with our function code. For example, the function validatePhonenumber() makes sure hyphens are erased from the input and the length is verified to be 9 - 10 digits. The main function for this script is jsonifyData(). The first half of the function gets the form input when the submit button is pressed:

function jsonifyData(event)
{
    event.preventDefault();
    var phonenumber = document.getElementById('phonenumber').value;
    var product_name = document.getElementById('product_name').value;
    var url = document.getElementById('url').value;
    var percentage = document.getElementById('percentage').value;
    var duration = document.getElementById('duration').value;

    if(validateURL(url) == null)
    {
        $('#popup').html("INVALID URL");
        return;
    }  
    if(validatePhonenumber(phonenumber) == null)
    {
        document.getElementById("popup").innerHTML = "INVALID PHONENUMBER";
        //$('#popup').html("INVALID PHONENUMBER");
        return;
    }  
Enter fullscreen mode Exit fullscreen mode

If the data is not valid, the process is terminated and the function sends an error message to the user. Otherwise, the cleaned data is sent as parameters for an XHTTP request, which posts the parameters from the form to the trigger:

var params = 'url=' + url + '&phonenumber=' + validatePhonenumber(phonenumber) + '&duration=' + duration + '&baseline_percentage=' + percentage + '&name=' + product_name.replaceAll(' ', '-');
    var http = new XMLHttpRequest();
    //TRIGGER LINKS: 'https://pricescrapertester.azurewebsites.net/api/SQLFormCompleter?'   'http://localhost:7071/api/SQLFormCompleter'
    var trigger_url = '<your http trigger link>';

    http.open('POST', trigger_url+params, true);

    http.setRequestHeader('Content-type', 'application/multipart/form-data');

    http.onreadystatechange = function() {//Call a function when the state changes.
        if(http.readyState == 4 && http.status == 200) {
            alert(http.responseText);
        }
    }
    console.log("POST: " + trigger_url+params);
    console.log(params);

    document.getElementById("popup").innerHTML = "FORM STATUS: SUCCESS";
Enter fullscreen mode Exit fullscreen mode

Make sure to use your own trigger urls, which can be found in ‘Overview’ for each function app. Once script.js works, redeploy your static webapp.

Step 7: CELEBRATING!!

Alt Text

Congratulations! You have built a web app from scratch and, more importantly, have saved yourself plenty of money and frustration when it comes to shopping on Amazon. This is no easy feat, but I hope you have enjoyed coding this project as much as I have. If you would like to refer to my code, here is the link to my Github repository for the project. Thank you for reading this article, and please leave any feedback or questions in the comments.

Special Thanks:

I would also like to thank BitProject and Microsoft, who inspired me to create this project! I am grateful to have learned about cloud computing and Azure in such a fun and engaging way.

💖 💪 🙅 🚩
fifiteklemedhin
FifiTeklemedhin

Posted on December 29, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related