Scrape Walmart Search for a specific store
Illia Zub
Posted on December 2, 2021
Walmart responds with results for Sacramento for requests outside of the US.
But how to search for products that are available in a specific store? Any store on Walmart can be chosen without browser automation — only by setting relevant cookies in the plain HTTP request.
To figure out on your own, JS and browser dev tools knowledge will be enough. Some Ruby knowledge is required to understand this post.
Location cookies
I've updated location several times and checked the browser Dev Tools -> Application -> Cookies.
There are several cookies being updated after choosing a different location: locGuestData
, locDataV3
, assortmentStoreId
; ACID
, hasACID
, hasLocData
.
location-data
also looks relevant but it contains postal code and address for a store I haven't chosen. Maybe it was used before Walmart migrated to GrapgQL API.
locDataV3
and locGuestData
are Base64 and URI-encoded JSON objects. locDataV3
contains more data than locGuestData
. But data of locGuestData
can be used for both.
ACID
is a UUID. It can be generated on the client.
hasACID
and hasLocData
are flags.
Understanding locGuestData
Let's check what's inside this cookie value to understand how to set the store ID.
Example of encoded locGuestData
When sending requests to Walmart, locGuestData
is a Base64-encoded string.
eyJpbnRlbnQiOiJTSElQUElORyIsInN0b3JlSW50ZW50IjoiUElDS1VQIiwibWVyZ2VGbGFnIjp0cnVlLCJwaWNrdXAiOnsibm9kZUlkIjoiNDExNSIsInRpbWVzdGFtcCI6MTYzNzMyODUwMDUyM30sInBvc3RhbENvZGUiOnsiYmFzZSI6Ijc4MTU0IiwidGltZXN0YW1wIjoxNjM3MzI4NTAwNTIzfSwidmFsaWRhdGVLZXkiOiJwcm9kOnYyOjUyNzNlMDFjLTA4NzAtNGUwOS05ODU4LTAzYTI2ZDQ5N2ZhOSJ9
Example of decoded locGuestData
This Base64 string is a encoded JSON object.
JSON.parse(decodeURIComponent(atob("eyJpbnRlbnQiOiJTSElQUElORyIsInN0b3JlSW50ZW50IjoiUElDS1VQIiwibWVyZ2VGbGFnIjp0cnVlLCJwaWNrdXAiOnsibm9kZUlkIjoiNDExNSIsInRpbWVzdGFtcCI6MTYzNzMyODUwMDUyM30sInBvc3RhbENvZGUiOnsiYmFzZSI6Ijc4MTU0IiwidGltZXN0YW1wIjoxNjM3MzI4NTAwNTIzfSwidmFsaWRhdGVLZXkiOiJwcm9kOnYyOjUyNzNlMDFjLTA4NzAtNGUwOS05ODU4LTAzYTI2ZDQ5N2ZhOSJ9")))
{
"intent": "SHIPPING",
"storeIntent": "PICKUP",
"mergeFlag": true,
"pickup": {
"nodeId": "4115",
"timestamp": 1637328500523
},
"postalCode": {
"base": "78154",
"timestamp": 1637328500523
},
"validateKey": "prod:v2:5273e01c-0870-4e09-9858-03a26d497fa9"
}
After changing Walmart store several times, I've seen that nodeId
and postalCode.base
are changing.
Generate timestamp
and acid
for locGuestData
timestamp
and acid
can be generated on every request.
timestamp = Time.now.to_i
acid = SecureRandom.uuid
Base64-encode location data
Next, let's Base64-encode that JSON string as Walmart expects.
timestamp = Time.now.to_i
acid = SecureRandom.uuid
location_guest_data = {
intent: "SHIPPING",
storeIntent: "PICKUP",
mergeFlag: true,
pickup: {
nodeId: store_id,
timestamp: timestamp
},
postalCode: {
base: postal_code,
timestamp: timestamp
},
validateKey: "prod:v2:#{acid}"
}
encoded_location_data = Base64.urlsafe_encode64(JSON.dump(location_guest_data))
Create cookie string
Finally, a location cookie string contains all the required fields.
%(ACID=#{acid}; hasACID=true; hasLocData=1; locDataV3=#{location_guest_data}; assortmentStoreId=#{store_id}; locGuestData=#{encoded_location_data})
Complete function to create Walmart location cookie
Putting all together.
def location_cookie(store_id, postal_code)
return if store_id.blank?
timestamp = Time.now.to_i
acid = SecureRandom.uuid
location_guest_data = {
intent: "SHIPPING",
storeIntent: "PICKUP",
mergeFlag: true,
pickup: {
nodeId: store_id,
timestamp: timestamp
},
postalCode: {
base: postal_code,
timestamp: timestamp
},
validateKey: "prod:v2:#{acid}"
}
encoded_location_data = Base64.urlsafe_encode64(JSON.dump(location_guest_data))
%(ACID=#{acid}; hasACID=true; hasLocData=1; locDataV3=#{location_guest_data}; assortmentStoreId=#{store_id}; locGuestData=#{encoded_location_data})
end
Then make an HTTP request using the language and libraries you've chosen.
import got from 'got';
const STORE_ID = "4115";
const POSTAL_CODE = "78154";
const locationCookie = getLocationCookie(STORE_ID, POSTAL_CODE);
const htmlResponse = await got('https://www.walmart.com/search?q=cookie', {
headers: {
cookie: locationCookie
}
});
Where to get store ID and postal code
Well, but we wouldn't hard-code store ID and postal code into the web scraping program. A CSV of 4.6k stores can be used to find and store ID dynamically.
Programmatic usage of CSV is out of the scope of this post. All that is needed is to read find store ID and postal code for a specific location in a table.
Updating a list of Walmart stores IDs and locations
Walmart provides several sources to find stores. Data can be populated from one of those sources:
Store Directory
Store Directory contains links on four levels: country, states, cities, and stores. To get the data, iterate over all elements on the specific level and make subsequent requests.
States
Assuming the country is the US, 51 states can be hard-coded. Walmart front-end requests data from the JSON endpoint https://www.walmart.com/store/electrode/api/store-directory
. It accepts the st
search parameter.
Example: https://www.walmart.com/store/electrode/api/store-directory?st=AL
.
It returns a list of cities. Each city object contains city
, and storeId
or storeCount
. The city with storeId
contains a single store. The city with storeCount
contains multiple stores.
Single store in a city
Request to a specific store returns an HTML page. Example: https://www.walmart.com/store/5744
.
Store address and postal code should be extracted from the HTML. Store ID is already in URI.
let postalCode = document.querySelector(".store-address-postal[itemprop=postalCode]").textContent;
let address = document.querySelector(".store-address[itemprop=address]").textContent;
Multiple stores in a city
Request for multiple stores returns a JSON response. Cities with a single store respond with an empty array ([]
) so we have to parse HTML.
Example request for multiple stores
https://www.walmart.com/store/electrode/api/store-directory?st=AL&city=Decatur
Sample city from the response
{
"displayName": "Neighborhood Market",
"storeName": "Neighborhood Market",
"address": "1203 6th Ave Se",
"phone": "256-822-6366",
"postalCode": "35601",
"storeId": 2488
}
Putting all together
Pseudo-code to collect store IDs and locations for all US states.
const STATES = ["AL", "TX", "CA", /* ... */];
let walmartStores = [];
for (let state of STATES) {
let cities = get(`https://www.walmart.com/store/electrode/api/store-directory?st=${state}`);
for (let { storeId, storeCount, city } of cities) {
if (storeId && !storeCount) {
let store = get(`https://www.walmart.com/store/${storeId}`);
let document = parseHTML(store);
let postalCode = document.querySelector(".store-address-postal[itemprop=postalCode]").textContent;
let address = document.querySelector(".store-address[itemprop=address]").textContent;
walmartStores.push({ postalCode, address, storeId: storeId });
} else if (!storeId && storeCount > 0) {
let stores = get(`https://www.walmart.com/store/electrode/api/store-directory?st=${state}&city=${city}`);
walmartStores.concat(stores);
}
}
}
csv.write("walmart_stores.csv", walmartStores);
Existing programs to scrape Walmart Stores
Search on GitHub via grep.app shows four relevant repositories
- Akamai edge workers example. But it contains only 471 Walmart stores.
$ curl -s https://raw.githubusercontent.com/akamai/edgeworkers-examples/master/edgecompute/examples/personalization/storelocator/data/locations.json | jq '.elements[].tags | select(."ref:walmart" != null) | .ref' | wc -l
471
scrapehero/walmart_store_locator
which scrapes stores by postal codes. But finding a list of actual postal codes turned out to be harder than finding a list of actual US states.theriley106/WaltonAnalytics
which is great to extract data from Walmart but not Walmart stores.GUI/covid-vaccine-spotter
which scrapes stores by postal codes. But finding a list of actual postal codes turned out to be harder than finding a list of actual US states.
So, I've played with Rust and came up with this (rough) program.
After going through compilation errors, it worked well. Thanks to this helpful blog post about async streams in Rust. Every time my program compiled, it actually worked. Fixing compilation errors is hard (for non-rustacean) but there's was no need to debug the program in runtime which is great.
Conclusion
Scraping Walmart is fairly easy — it contains inline JSON data for all products on the search results page.
Update location cookies to specify the location for plain HTTP requests to Walmart.
If you have anything to share, any questions, suggestions, or something that isn't working correctly, feel free to drop a comment in the comment section or reach out via Twitter at @ilyazub_, or @serp_api.
Yours,
Ilya, and the rest of the SerpApi Team.
Posted on December 2, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.