As part of a small project to verify backups, I came across a case where I had two photos that looked identical but with different EXIF data.
The backup verification system (correctly) flagged these as two different files – as the SHA1 file hashes were different. However, the actual photo was – as far as I could tell – absolutely identical, so I started looking to see if there was a way to verify JPEG files based on the image data alone (instead of the entire file, which would include meta stuff like the EXIF data).
A quick look around revealed that ImageMagick has a “signature hash” function as part of ‘identify‘, which sounded perfect. You can test it like so:
identify.exe -verbose -format “%#” test.jpg
At first glance this solved the problem, but testing on a few systems showed that I was getting different hashes for the same file – it looked like different versions of ImageMagick return a different hash. I’ve asked about this on their forum and was told that the signature algorithm has changed a few times – which makes it sort of useless if compatibility across platforms is required.
After looking around a bit more for alternative I found the (possibly Australian made?) PHP JPEG Metadata Toolkit, which (amongst many other things) includes a get_jpeg_image_data() function which (so far) seems to work reliably across systems. Pulling the data out and running it through SHA1 gives a simple usable way to hash the image-only data in a JPEG file.