How To Turn Your VPN Into A Proxy Using Python
Tom Hughes
Posted on July 2, 2020
I love scraping data.
I can write a script that in a few seconds can pull data from a site, filter out all the html tags and javascript mumbo jumbo, and spit out the exact data that I want in a beautiful, useable format (preferably JSON).
Without web scraping that would take me HOURS of copy and pasting.
One frustrating part about web scraping though is that generally site owners don't want you scraping their site. Which is totally fair enough.
However, if you're still hell bent on web scraping, you can use what's known as a 'proxy' to hide your IP address.
This makes it much harder for websites to stop you scraping them.
A proxy works by tunnelling all your requests through a seperate server.
For the site owner, it looks like it's the seperate server that's making the request, and they are. But then they are relaying that request right back to you, sneaky!
Today I'm going to show you how to use any commercial VPN (NordVPN, ExpressVPN etc) with the requests library in Python to level up your web scraping game.
First off, we're going to import the libraries we want to use. In this tutorial we're just going to use the requests library.
import requests
Using a proxy with the requests library is done with the following structure;
requests.get(url, proxies=proxy)
That's it. How damn easy is that!
So what is that 'proxy' object we passed into the get function?
The proxy object is a dictionary that maps each protocol (http, https, ftp etc) to a specific proxy in the following format;
proxy = {
'http': "username:password@host",
'https': "username:password@host"
}
Now we just need to fill in the blanks here. I'm using NordVPN but any popular VPN service will work (ExpressVPN, SurfShark etc).
Your username and password will be the same as the one you use to login to your VPN.
Notice in the proxy string, the characters : and @ are used to seperate the username, password and host. If you have these characters in your username or password, the interpreter will get confused and the proxy won't work.
For this reason we need to encode our username and password, more info can be found on that here. For reference, @ becomes %40 and : becomes %3A.
Now we just need to fill in the 'host' part of the string.
Navigating to your VPN providers website, there should be a section that lists all their servers, with NordVPN there's a 'servers' link on the homepage that gives you all the information you need;
Using the above information, the host we're going to use is au473.nordvpn.com
.
So our full proxy object becomes;
proxy = {
'http': "tom%40gmail.com:password123@au473.nordvpn.com",
'https': "tom%40gmail.com:password123@au473.nordvpn.com"
}
These aren't my real login details, but you knew that.
Putting it all together we get;
import requests
proxy = {
'http': "tom%40gmail.com:password123@au473.nordvpn.com",
'https': "tom%40gmail.com:password123@au473.nordvpn.com"
}
requests.get('https://google.com',proxies=proxy)
And that's it! Now all the requests you make will LOOK like they're coming from NordVPN, cool huh!
We've managed to turn any VPN service into a proxy with a few short lines of code.
Hopefully you've learnt something new today :)
If you want to be EVEN more stealthy when web scraping, I'll be writing more articles here on the topic, so be sure to follow me to stay updated!
Posted on July 2, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.