The Beginner’s Guide to Browser Fingerprinting For Fraud Detection
Savannah Copland 👋
Posted on January 4, 2021
Website fraud can be incredibly frustrating to deal with, especially for small websites. Fraud comes in many forms, including spam bots filling out forms, fraudsters trying to steal login information, or scammers making fake purchases. What website owners and developers need is the ultimate 'swiss army knife' for your fraud-fighting toolkit - browser fingerprinting.
Browser fingerprinting provides a highly accurate user identifier that makes it much easier to triage suspicious traffic. The key to identifying those most likely to commit fraud is either by past activity, or by associating specific patterns of use with a higher likelihood of fraudulence.
Browser fingerprinting is already used by many companies for developer-led fraud prevention as it cuts through spoofing attempts to accurately identify users, and it can do this without requiring additional permissions from the user. FingerprintJS has an open source browser fingerprinting library with over 12K stars on Github and is used by 8,000+ websites. Fingerprinting techniques on their own have been found to be over 90% accurate in correctly identifying a unique user in the browser, and when used in conjunction with usage history, fuzzy matching, and probability engines, this accuracy can be further improved.
How Fingerprinting Works
Identifying a Vehicle
To explain the technology in an 'ELI5' style, here's an analogy: let’s say you’re a detective in a large city trying to find one specific car suspected of being involved in a crime, as captured by a security camera. To find this car your plan is to go to a busy intersection and take note of all the details of passing cars until you find one that matches the vehicle on the security camera. Ideally, you would like to be able to uniquely identify the car, such that only one vehicle in the city matches your description, otherwise you may have to question multiple drivers.
Let's say the security camera caught some basic details (or signals) about the vehicle. From this, you’ll be able to narrow your search considerably:
- Color (blue)
- Manufacturer (Chevrolet)
- Type of car (truck)
- Model name (Silverado)
- Brand of tires (stock Goodyears)
- Age/year (2015-2021)
With these signals, you may be able to uniquely identify the vehicle right away, especially if any of the specifics are particularly rare. However, in a city with millions of drivers, there may be hundreds of blue Chevrolet Silverado trucks with standard-issue tires. The more standard the combination of signals, the harder it is to get a unique match.
In those cases, you hope that your camera may have gotten lucky and matched on a more unique signal about the vehicle:
- Wood panelling
- Custom logo or paint job
- Rust or damage
- Interior decorations
Any one of these signals may quickly narrow down your search. A blue Chevrolet Silverado truck with a local company’s logo could very well be unique, even in a large city.
It's worth mentioning the most uniquely identifiable element of a car that I have let out so far - the license plate. License plates serve the express purpose of uniquely identifying a car, but what good will they do if the owner removes their plates or swaps them with fakes? It’s important to have a backup for when this method of identification fails.
By assembling a broad and comprehensive set of identifiers you can narrow the list of suspects to make singling out a bad-actor much easier.
Identifying a Visitor
Fingerprinting works just about exactly the same as the car example above. Only now you are trying to identify a visitor to a website (suspect) by capturing signals passed via the visitor’s browser (car) using a fingerprinting function (security camera).
A lot of signals can be captured through the browser, including:
- User agent details (browsers installed and their versions, operating system)
- Hardware details (screen resolution, battery usage, device memory)
- Browser plugins used
- Browser and OS settings
- WebGL parameters
When a new visitor lands on your webpage, the fingerprinting function collects signals and compiles them into a hash that can be stored. Any time this visitor returns, their fingerprint can be compared to past visit history to identify suspicious behavior or fraudulent activity.
Accuracy
Let's say you are now collecting a 'fingerprint' for every visitor to your website. For that fingerprint to be useful as a method of uniquely identifying visitors, it needs to have a high accuracy. The FingerprintJS Pro API has a 99.5% accuracy rate, which means for every 1,000 visits, 995 are correctly associated with a unique identifier.
For the 5 out of 1,000 that are not correctly identified, they are either false positives or false negatives:
- False positive: multiple unique visitors are given the same fingerprint
- False negative: one visitor over multiple visits are given different fingerprints
To reduce false results, your fingerprint should use the right combination of signals that balance both uniqueness and stability. If a signal is highly unique, it will reduce your chances of a false negative, whereas a signal that is highly stable will reduce your chances of a false positive.
While there are hundreds of signals available via the browser, you may want to avoid using some signals in your fingerprinting function altogether. If a signal has both low uniqueness and low stability, it is likely to change over time or be spoofed frequently, and would not contribute meaningfully to uniqueness. To our car example, this might be whether a car has a dirty windshield - you cannot count on this signal to improve your chances of finding the correct car. In the world of browser fingerprinting, current battery level is a poor signal, and so while it is accessible, I would not recommend including it in any fingerprinting function you use.
The Case for Cookies
Special consideration should be given to highly unique identifiers that are not always available for user identification purposes. The most ubiquitous example of this is cookies.
Cookies work by storing a unique identifier hash in the browser when a visitor first lands on your website. When a visitor has a cookie that matches a previous visit record in your database, you can be certain that these two visitors are the same. However, cookies are a very easy identifier for a visitor to conceal:
- Cookies can be cleared in browser settings
- Adblockers can disable cookies by default
- Visitors can revoke consent to being cookied as part of GDPR or CCPA
In these cases, instead of including a cookie as an identifier in your fingerprinting function, it can be more useful to use logic to determine when to use cookies as your identifier:
- If cookie matches a previous record: use cookie
- If no cookie matches previous record: use fingerprint
One of the main advantages of fingerprinting is that it is stateless. A well-implemented fingerprint can remain stable through multiple sessions, incognito browsing, uninstalling or reinstalling apps, or clearing cookies. For that reason, using the two methods in conjunction with one another can give a higher % accuracy than either identification method alone.
FingerprintJS Pro achieves its high rate of accuracy by using fingerprinting, cookies and additional machine learning techniques that incorporate IP address and geolocation. One challenge is keeping up with changes in available signals as new browser versions are released. Anytime Chrome or Safari is updated, for example, identification techniques need to be re-evaluated to determine if further tweaks need to be made to keep accuracy high. The team at FingerprintJS is constantly looking to improve our accuracy by iterating on the signals, algorithms, and techniques used.
Fraud Applications For Fingerprinting
An important thing to keep in mind when dealing with fraud is that only a small percentage of visitors are responsible for the majority of fraud cases. You will need to find ways to isolate these fraudulent visitors, verify their identity through authentication, and blacklist them as needed. However, you will want to avoid putting up roadblocks for your ‘trusted’ traffic, as additional authentication can be detrimental to user experience. You don't want to be slowing your users’ ability to access their account, make purchases, and engage with your website.
Let's explore one example of online fraud to see how you could use fingerprinting in a flexible way to isolate fraud and keep your website experience seamless.
Account takeover is a common form of fraud where malicious users try to log in to other users’ accounts, and is an excellent use-case for fingerprinting technology. Additional security at login can make account takeover much more difficult, though the type of authentication used may depend on the suspicious behavior your website experiences most often:
-
For bot or brute force attacks (one user or a network of bots trying many combinations of usernames/passwords):
- Show a captcha after 1 unsuccessful login attempt on a fingerprint.
- Lock user out of attempting login after 5 unsuccessful attempts on a fingerprint.
-
For phished accounts (a user obtained someone else’s legitimate login information through a scam or social engineering):
- Require two-factor or email authentication when attempting to login with a new fingerprint.
- Blacklist specific fingerprinted visitors from your site based on their fingerprint.
For each of these cases, type of authentication needed can be incorporated into your website by using existing workflows without having to fundamentally change the architecture of your site.
It is also important to note that users intending to commit fraud are much more likely to use techniques to conceal their identity, including using incognito mode, VPNs, and disabling cookies. These are the cases where fingerprinting especially shines, as it can associate these users without needing easily concealed identifiers like cookies and IP addresses.
Browser vs. Device Fingerprinting
The FingerprintJS open source library as well as the Pro API are intended for browser fingerprinting - they can accurately identify visitors to a website using all modern mobile and desktop browsers. However, if you want to identify users of a native mobile app, you will need to use a device fingerprinting function that is made specifically for each mobile operating system. The signals available for mobile app developers are different from signals that can be retrieved in the browser, and vary between iOS, Android, and other mobile operating systems.
The FingerprintJS team recently launched Fingerprint Android, our first open source library for identifying unique Android devices. You can read more about how our Fingerprint Android library works in our explainer article.
Get Involved
I would love to hear your questions and get feedback from the developer community on our fingerprinting technology.
Here are a few ways you can get involved
- Star, follow or fork our Github projects: FingerprintJS (browser fingerprinting) and Fingerprint-Android
- Need more accurate browser fingerprinting for your business? Try FingerprintJS Pro for 99.5% fingerprinting accuracy
- Email us your questions
- Sign up for our newsletter for updates
Posted on January 4, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
April 24, 2023