Jordan Scrapes Washington’s Marijuana Producers

aarmora

Jordan Hansen

Posted on October 22, 2020

Jordan Scrapes Washington’s Marijuana Producers

Demo code here

Hello. The goal with this post is to find the legal names of Washington’s marijuana producers. This would be useful to persons who wanted to market to these producers. With the legal names you could confirm owners at the Washington secretary of state.

We are using two different sites to do this. The first, 502data.com, has a list of all the producers but not their legal names. The second, TopShelfData, has the legal name of the company. Using this legal name you can easily find the business information from the Washington secretary of state.

502data.com

fun gif

After a quick inspection of 502data.com, it was clear that they were using Angularjs for their framework. Knowing this, I fully expected to be able to see XHR requests with the data. But, going to https://502data.com/allproducerprocessors only had two requests. Neither had any relevant information.

xhr requests on 502data.com producers

This really confused me. The data was clearly not there on page load. Look at what it was before all of the javascript rendered.

data on page load

My next step was to go through the javascript. If the data was getting pulled in via XHR, it must be referenced somewhere in the javascript. Looking at these script files, however, nothing called out to me as something that would manage the app itself.

script files

Next stop was the root page. Going through the script tags I finally found what I was looking for at the bottom of the page. Jackpot.

script with all of Washington's marijuana producers

See $scope.licenses? That’s what I’m looking for. It’s a huge array of all the marijuana producers in Washington. Checking the length gave me over 1500.

I’d never used cheerio to get script data before but it turned out to be fairly simple.

   const url = 'https://502data.com/allproducerprocessors';

    const axiosResponse = await axios.get(url);

    const $ = cheerio.load(axiosResponse.data);

    const script = $('script:nth-of-type(7)').html();

    const scriptSplit = script?.split('$scope.licenses = ');
    let arrayOfbusinesses: any[] = [];
    if (scriptSplit) {
        arrayOfbusinesses = JSON.parse(scriptSplit[1].split(';')[0]);
    }
Enter fullscreen mode Exit fullscreen mode

Only difference from the typical selectors is using the html() instead of text(). After that I just split the html until I found only the part I wanted. Then it was simply a matter of JSON.parse().

BAM. Just like that I have my producers. Now to get their legal name.

TopShelfData

fun gif of a plant

Off we go to TopShelfData. The registered name is the item for which we are looking.

topshelfdata to get Washington marijuana producer legal names

The data that we have from 502data.com looks like this:

    {
        "licensenumber": "78256",
        "name": "EVERGREEN HERBAL",
        "tier": 0,
        "city": "SEATTLE",
        "county": "KING",
        "totalSales": 26827987.182500,
        "ytdSales": 2887764.770000,
        "lastMonthSales": 588414.440000
    }
Enter fullscreen mode Exit fullscreen mode

So we need to convert the above data into the URL from the above picture. At first I thought I could just lower case everything and put dashes to replace the spaces. But then we have the problem if we ever have more than one business with the same name. As you can see in the photo above, there is a 1 at the end of the URL.

So…I tried searching to see how TopShelfData narrowed it down.

topshelfdata search results

Bam. We’re in business. The search returns XHR results. So I just submitted my business name as the query and then I would find the business from the suggestions that contained the same city.

export async function getSlugFromTopShelfData(businessName: string, city: string): Promise<IBusinessSearchData> {
    const url = `https://www.topshelfdata.com/search?query=${businessName}`;
    const convertedCity = city.toLocaleLowerCase().replace(/\s/g, '-');

    const axiosResponse = await axios.get(url);
    const suggestions = axiosResponse.data?.suggestions;

    const foundBusiness = suggestions.find(suggestion => suggestion?.data?.address_city.includes(convertedCity));

    return foundBusiness?.data;
}
Enter fullscreen mode Exit fullscreen mode

With this, it was simply a matter navigating directly to the url and getting the legal name of the business.

export async function checkTopShelfDataDetails(businessSearchData: IBusinessSearchData) {
    const url = `https://www.topshelfdata.com/wa/${businessSearchData.address_city}/${businessSearchData.slug}`;

    let axiosResponse: AxiosResponse;

    try {
        axiosResponse = await axios.get(url);
    }
    catch (e) {
        console.log('e', e.response ? e.response.status : e.errno);
        throw '';
    }

    const $ = cheerio.load(axiosResponse.data);

    const title = $('.business-info div:nth-of-type(3) a').text();

    console.log('title', title);
}
Enter fullscreen mode Exit fullscreen mode

Done. Very fun scrape!

Demo code here

Looking for business leads?

Using the techniques talked about here at javascriptwebscrapingguy.com, we’ve been able to launch a way to access awesome web data. Learn more at Cobalt Intelligence!

The post Jordan Scrapes Washington’s Marijuana Producers appeared first on JavaScript Web Scraping Guy.

💖 💪 🙅 🚩
aarmora
Jordan Hansen

Posted on October 22, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Jordan Scrapes Redfin
webscraping Jordan Scrapes Redfin

October 22, 2020

Jordan Scrapes Secretary of State: Arkansas
webscraping Jordan Scrapes Secretary of State: Arkansas

October 22, 2020