Benjamin Delespierre
Posted on August 24, 2020
TL;DR
Install the lib with composer:
composer install bdelespierre/php-phash
It exposes 2 commands:
vendor/bin/phash generate <image>
vendor/bin/phash compare <image1> <image2>
Perceptual Hashing
Let's say you are developing a social network and you want to prevent people from reposting other people's content. How would you do that?
You can't really use MD5 checksum because chances are the files you're comparing are going to be slightly different and therefore, their hashes won't be the same.
Fortunately, there's a very simple method to determine programmatically whether an image "looks like" another: Perceptual Hashing.
Lucky you, I just wrote a lib to do just that! Here's how it works.
Demonstration
Note: instructions below are borrowed from Hackerfactor, thanks to them for introducing me this algorithm.
Step1: resize to 8x8
Step2: reduce the colors to grayscale.
Step3: calculate the color average.
Step4: iterate over the pixels to compute the bits; if the pixel color is below the average, it's a zero, above it's a one.
Step5: create the 64 bits hash
In PHP
Looks a little bit like this:
$image = $this->manager->make($file)
->resize($size, $size)
->greyscale();
$sum = 0;
for ($x = 0; $x < $size; $x++) {
for ($y = 0; $y < $size; $y++) {
$sum += $image->pickColor($x, $y, 'array')[0];
}
}
$mean = $sum / ($size ** 2);
$bits = "";
for ($x = 0; $x < $size; $x++) {
for ($y = 0; $y < $size; $y++) {
$bits .= $image->pickColor($x, $y, 'array')[0] > $mean ? 1 : 0;
}
}
You don't have to copy that. Just grab the package.
How to compare hashes
Our hash here is simply a bitfield. I expressed it as a string instead of an acutal bitfield because it's easier to manipulate, especially for beginners.
Now to determine of "far" an image is from another, what we need to do is to compare their hashes. We can do that using the Hamming Distance.
$hash1 = phash('images/1.jpg');
$hash1 = phash('images/2.jpg');
$size = strlen($hash1);
for ($dist = 0, $i = 0; $i < $size; $i++) {
if ($hash1[$i] != $hash2[$i]) {
$dist++;
}
}
$similarity = 1 - $dist / $size;
And voilà! $similarity
will be a float between 0 (entirely different) and 1 (exactly the same). You may consider any value above 0.95 means the images are very close.
Leave a like and comment to tell me what you think.
See you soon for more useful snippets like this one!
Posted on August 24, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.