Image Data Only Hashing of JPEG Files

As part of a small project to verify backups, I came across a case where I had two photos that looked identical but with different EXIF data.

The backup verification system (correctly) flagged these as two different files – as the SHA1 file hashes were different. However, the actual photo was – as far as I could tell – absolutely identical, so I started looking to see if there was a way to verify JPEG files based on the image data alone (instead of the entire file, which would include meta stuff like the EXIF data).

A quick look around revealed that ImageMagick has a “signature hash” function as part of ‘identify‘, which sounded perfect. You can test it like so:

identify.exe -verbose -format “%#” test.jpg

At first glance this solved the problem, but testing on a few systems showed that I was getting different hashes for the same file – it looked like different versions of ImageMagick return a different hash. I’ve asked about this on their forum and was told that the signature algorithm has changed a few times – which makes it sort of useless if compatibility across platforms is required.

After looking around a bit more for alternative I found the (possibly Australian made?) PHP JPEG Metadata Toolkit, which (amongst many other things) includes a get_jpeg_image_data() function which (so far) seems to work reliably across systems. Pulling the data out and running it through SHA1 gives a simple usable way to hash the image-only data in a JPEG file.

Terrible Thunderbird v15.x IMAP Performance with AVG

My PC has recently been chugging a lot more than usual – massive disk activity and high CPU utilisation. Looking into it I quickly realised that it was happening whenever Thunderbird received a large bolus of new email – more than 15-20 emails within a minute or two. When I clicked on the folder with the new email, I could see in the status bar at the bottom that Thunderbird was very slowly downloading these new emails, while my disk and CPU went crazy.

Looking further I noticed that in Filemon, AVG was doing a lot of the work. Disabling AVG’s “Resident Shield” during one of these operations almost immediately fixes the symptoms – the email comes down much faster and the disk activity and CPU returns to normal.

This seemed to happen around the same time as Thunderbird v15.x was released, but I don’t want to declare that the culprit, especially as it is probably the same thing that I noticed with Microsoft Security Essentials that started happening around v11.x. I’m curious if something fundamental changed back then – either internally in Thunderbird, or perhaps within AVG – but it’s certainly possible that I’m just getting a little bit more email now and it’s just tripped my PC over the edge. I assume it has something to do with the way AVG hooks into the disk reading/writing operations – possibly Thunderbird changed something low-level there and it is simply reacting badly with how AVG does its real-time checking.

In any case, if you are experiencing massive slowdowns and system chunkiness using Thunderbird in conjunction with AVG, you can simply temporarily disable the real-time checking when getting a large number of emails. Obviously you probably don’t want to leave it off altogether.

MongoDB Fails Updating on Debian

Every so often there’s a MongoDB update on my Debian VPS that fails. The output of ‘aptitude full-upgrade’ is:

# aptitude full-upgrade
The following partially installed packages will be configured:
mongodb-10gen
No packages will be installed, upgraded, or removed.
0 packages upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B of archives. After unpacking 0 B will be used.
Setting up mongodb-10gen (2.0.5) …
Starting database: mongodb failed!
invoke-rc.d: initscript mongodb, action “start” failed.
dpkg: error processing mongodb-10gen (–configure):
subprocess installed post-installation script returned error exit status 1
configured to not write apport reports
Errors were encountered while processing:
mongodb-10gen
E: Sub-process /usr/bin/dpkg returned an error code (1)
A package failed to install. Trying to recover:
Setting up mongodb-10gen (2.0.5) …
Starting database: mongodb failed!
invoke-rc.d: initscript mongodb, action “start” failed.
dpkg: error processing mongodb-10gen (–configure):
subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
mongodb-10gen

The update works fine, but mongo just fails to start properly.

The problem in my case is simply that there’s a /var/lib/mongodb/mongod.lock file lying around from some previous process. Deleting that file and re-running the aptitude command will start it properly. (Reminder post because I keep forgetting what the problem is.)

AVG on Linux False Positives for NSIS

As of today, we’re seeing what I’m very confident are false positives in AVG running on Linux on our file servers. This has started happening after this morning’s virus database update. The database release we’re using is:

Virus database version: 271.1.1/4927
Virus database release date: Wed, 11 Apr 2012 05:55:00 +10:00

The output of avgscan is:

utils.exe |%name%=Win32/Validace_partial.nsis3|%idn%=0bcfdae664a2c000|=Win32/Validace_partial.nsis3

Files scanned : 1(1)
Infections found : 1(1)
PUPs found : 0
Files healed : 0
Warnings reported : 0
Errors reported : 0

The ‘nsis’ in the output there is presumably referring to the excellent Nullsoft Scriptable Install System (NSIS). The files I’m testing are largely game installers; when cross-checked with a file I built using NSIS it also triggers the false positive.

We are contacting AVG to report this as a probable false positive signature.

Update 3rd May 2012: AVG recommended we update to the 2012 version to fix this issue, which we did – and it fixed the problem.

Thunderbird Freezes When Deleting or Moving Email

I recently updated to the latest Thunderbird (v11.0) and was disappointed to discover that suddenly whenever I was deleting an email or moving it into a different folder, the entire application would freeze for 1-2 seconds while it processed that command.

I am fastidious about email and spend probably more time than I should ensuring everything is filed into appropriate folders (or deleted if I’m never going to look at it again). When you’re getting hundreds of emails a day, deleting and moving needs to be an operation that consumes near zero time, otherwise you’re suddenly spending way more time “doing email” than you should be. As a result, these freezes were massively irritating and caused no end of problems.

I reinstalled Thunderbird, which seemed to fix it temporarily – but before I knew it was happening again. I tried rebuilding and compacting folders – all for naught. I tried searching the Thunderbird Bugzilla looking for similar reports, but I couldn’t see anyone else having the problem.

I put up with this for a while trying various things, but eventually gave up and fired up the incredibly handy FileMon utility from the SysInternals guys to see if anything obvious was happening on the disk side of things that would account for this freeze.

Immediate pay dirt; this chunk of output in FileMon is shows the main part of what happened when I tried to move an email into a subfolder of the Inbox:

You can see there the operation started at 4:11:37pm and then the next activity was at 4:11:39pm – two seconds was roughly how long I was seeing Thunderbird freeze for.

Next step was looking at what MsMpEng.exe was – Microsoft Security Essentials. Turns out MSE was installed on my PC as part of a general system policy update at around the same time I upgraded to Thunderbird v11.0.

I tried changing the settings to see if that was indeed the cause – in MSE you just look for the Settings tab, select Real-time protection, and uncheck the ‘Turn on real-time protection’ box. Immediately Thunderbird started behaving normally with no more freezes.

Fortunately there’s an ‘Excluded processes’ option in Microsoft Security Essentials so you can add Thunderbird.exe to the list of processes to skip. This completely fixed the problem for me and now I’m back to moving and deleting emails fast as ever.

Location-based Advertising Goes Wrong; Clues about Dodgy Advertising

If you’re an astute observer, you might have noticed some elements – for example, advertising or some other content – on overseas web sites sometimes have some element on them that refers to the city in which you’re living in.

It might seem like an astonishing coincidence that an article on the Toronto Times or the South Xihuan Observer just happens to have something like this on their website at the exact same time you just happened to click through from Google… but it isn’t. It is the result of location-based advertising – detecting some information about you from your web browser and figuring out where you are. Usually this is done by your IP address and it is a simple look-up in some database that maintains a list of how geographical locations map to certain IP ranges (colloquially referred to as “GeoIP”).

This is not an exact science, and as this screengrab from msnbc.com shows, sometimes things can go wrong:

This is probably just a simple programming error – the “REGION” tag should have been replaced with my actual region.

This is mostly a fascinatingly boring example of a web site bug.

The only interesting thing is that it clearly highlights that the module with that error is engaging in deception to try to trick you into clicking on it. Clearly, this is not a “new trick in your region” – it is some bullshit generic factoid, presumably about car insurance, that they’re trying to bait you into clicking by implying that it is related to where you live.

There are, of course, other location-based clues in this (rather poor) ad – it has what is pretty clearly a US police department patrol car, and the text of the ad refers to “miles per day” – so hopefully even the casual Australian Internet user would start hearing alarm bells.

While it almost certainly isn’t a scam and probably poses no real “danger”, it’s important for people to be alert for little tricks like this that attempt to change your behaviour by appealing to you by “hitting you at home”, so to speak.

Sogou Search Engine Spider Smashing Websites

Was keeping an eye on our CPU usage on a newly provisioned VPS on which a part of AusGamers was recently transferred to and noticed a big, unusual spike in CPU usage:

Correlating this with another graph indicated it was something hitting our news or forum pages pretty hard, so I nabbed the Apache logs and quickly determined what it was – the “Sogou web spider”, hitting our front page twice a second, over and over again:

199.119.201.102 – - [13/Sep/2011:10:52:16 +1000] “GET / HTTP/1.0″ 301 233 “http://www.ausgamers.com” “Sogou web spider/4.0(+h
ttp://www.sogou.com/docs/help/webmasters.htm#07)”
199.119.201.102 – - [13/Sep/2011:10:52:16 +1000] “GET / HTTP/1.0″ 301 233 “http://www.ausgamers.com” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”
199.119.201.102 – - [13/Sep/2011:10:52:17 +1000] “GET / HTTP/1.0″ 301 233 “http://www.ausgamers.com” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”
199.119.201.102 – - [13/Sep/2011:10:52:17 +1000] “GET / HTTP/1.0″ 301 233 “http://www.ausgamers.com” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”
199.119.201.102 – - [13/Sep/2011:10:52:17 +1000] “GET / HTTP/1.0″ 301 233 “http://www.ausgamers.com” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”
199.119.201.102 – - [13/Sep/2011:10:52:18 +1000] “GET / HTTP/1.0″ 301 233 “http://www.ausgamers.com” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”

… and so on, for a total of 18,763 requests Eventually it moved on to our different pages, but I stopped counting.

The URL in our logs directs you to a Chinese language FAQ, which when run through the awesome translate feature in Chrome directs you to a form for which you can submit a complaint about “crawling too fast”. I did that (in English) and will be fascinated to see if I get a response.

In the meantime, we just blocked the IP address.

Fixing Double Encoded Characters in MySQL

If you’re working on any old PHP/MySQL sites, chances are at some point you’re going to need to get into the murky, painful world of character encoding – presumably to convert everything to UTF-8 from whatever original setup you have. It is not fun, but fortunately many people have gone through it before and paved the way with a collection of useful information and scripts.

One problem which struck us recently when migrating our database server was certain characters being “double encoded”. This appears to be relatively common. For us, the cause was exporting our data – all UTF-8 data but stored in tables that were latin1 – via mysqldump and then importing again as if it was UTF-8. This means something like the characters are detected as multibyte, but because the source and destinations were different, they’re re-encoded – so you end up with these double encoded characters that look like squiggly gibberish appearing in all your web pages.

Nathan over at the Blue Box Group has written an extremely comprehensive guide to problems like this. It explains the root cause of these problems, the common symptoms, and – of course, most importantly – precise details on how to safely fix them. If you’re doing anything at all involved in changing character encoding then it is worth a read even before you have problems, just so you can get a better handle on how to fix things and what your end game should be.

There’s a few other ways to fix it, of course. The Blue Box solution is comprehensive and reliable but it requires quite a bit of work to get it going, and you also need to know which database table fields you want to work on specifically – so it can be time consuming unless you’re prepared to really sit down and work on it, either to process everything manually or write a script to do it all for you.

Fortunately there’s an easier way, as described here – basically, all you need to do is export your current dataset with mysqldump, forcing it to latin1, and then re-import it as UTF-8:

mysqldump -h DB_HOST -u DB_USER -p –opt –quote-names –skip-set-charset –default-character-set=latin1 DB_NAME > DB_NAME-dump.sql

mysql -h DB_HOST -u DB_USER -p –default-character-set=utf8 DB_NAME < DB_NAME-dump.sql

We did this for AusGamers.com and it worked perfectly – the only caveat you need to be aware of is that it will mess up UTF-8 characters that are properly encoded aleady. For us this wasn’t a big deal as we were able to clearly identify them and fix them manually.

StackOverflow has yet another approach which might be suitable if you’re dealing with only one or two tables and just want to fix it from the MySQL console or phpMyAdmin or whatever – changing the table character sets on the fly:

ALTER TABLE [tableName] MODIFY [columnName] [columnType] CHARACTER SET latin1
ALTER TABLE MyTable [tableName] [columnName] [columnType] CHARACTER SET binary
ALTER TABLE MyTable [tableName] [columnName] [columnType] CHARACTER SET utf8

This method worked fine for me in a test capacity on a single table but we didn’t end up using it everywhere.

Trials and Tribulations of Updating PGP Desktop

I somehow missed the news in April last year that Symantec would be acquiring PGP. Symantec doesn’t exactly have a stellar reputation amongst technical people (my Dell laptop still has some mystical, seemingly uninstallable software components from a Symantec product that was on there when I bought it that I could never get rid of), so I’m sure if I had known about it, it would have filled me with dread.

I found out about it today when I loaded PGP Desktop and realised I hadn’t checked for updates for a while. Normally I haven’t needed to – PGP were pretty good about emailing me about updates. So I opened the application and hit Help->Update. After a split second of thinking, I’m greeted with a dialog telling me: “Product manifest from the PGP Corporation update server fails the integrity check. Please try again later.” I tried again later, same thing, so I did the next step anyone would try when troubleshooting and Googled the error message.

I was directed to this thread on the Symantec forums (never a good sign when the first hits aren’t in some support knowledge base). Fortunately, it had a reply from a Symantec tech support person, so that was good news.

The reply advised users experiencing the problem to download this PDF. Another bad sign. Why isn’t this just linked on a website? Load the PDF and you’re greeted with something that looks like this:

Really? You can’t even get the slashes the right way around in your hyperlinks? Dread level increasing.

Anyway, I tried the process. Went to the URL in point 1 and was told I need to sign up for an account. No worries, makes sense after reading the rest of the document – you get access to a license management section in the Symantec website, so an account seems like a reasonable thing. A relatively painless process; didn’t even need to activate. Tried to log in – more dread:

Augh. Stuck.

I realise that Symantec probably have a bit of work to do as part of the changeover – they say as much in the forum post. But getting software updates seems like enough of a Big Deal to warrant a bit more effort – not to say attention to detail – if they expect corporate customers to want to keep coming back. If I wanted to go to all this effort with desktop encryption software and keeping it up to date, I’d be using GPG.

1 2 3 26  Scroll to top