Downloading Ubuntu Metalinks with aria2c

The v9.04 Ubuntu release happened recently and as always I found myself battling to get the occasional ISO that wouldn’t come down cleanly via BitTorrent.

I thought I’d give the metalink versions a try with aria2c. Unfortunately the Ubuntu defaults to having a ‘maxconnections’ of one – which, as far as I can tell from the metalink spec, means you’ll only make 1 connection to a server (which will probably end up just being the torrent anyway, as it has the highest priority).

If you naughtily download the metalink file you can, of course, edit the resources section in the XML to be whatever maxconnections you want. I feel justified in doing this because a) I don’t think it will duly unburden the servers and b) I’m doing this to reduce the overall load by providing another mirror alternative, but morally I still feel a bit squeamish about it.

Anyway, my download speed went from like several hundred kilobytes a second via BitTorrent only to the following:

[#2 SIZE:409.6MiB/695.8MiB(58%) CN:113 SPD:5017.71KiB/s UP:18.86KiB/s(800.0KiB) ETA:58s]

So, it made 113 connections (a big chunk of them were BitTorrent ones obviously), and I ended up getting the file at around 5mbytes/sec. Nice!

Spammy User-Agent “Mozilla/4.0 (compatible; MSIE 5.00; Windows 98)” is Probably FlashGet

If you’ve ever run a remotely popular Apache web server you might have used the mod_limitipconn module, which stops people from making too many simultaneous connections from the same IP address. With this module active, anyone trying to make too many connections will get an HTTP 503 Service Temporarily Unavailable error message.

Now, over the years watching log files for downloads from AusGamers, we’ve seen a lot of weird shit. One of the more common problems we’ve seen is a constant stream of spammy requests coming from many users that have all been identifying as “Mozilla/4.0 (compatible; MSIE 5.00; Windows 98)”. I looked at this ages ago and didn’t figure it out then, but after getting annoyed with it again I spent a bit more time today and have since figured out that this is the default User-Agent of the popular download manager, FlashGet.

Why it chooses to identify, by default, as IE running on Windows 98 is a bit beyond me, but that’s something I find annoying, because it made this harder than it should have to diagnose.

The real problem though is how FlashGet handles 503 errors. The RFC seems to imply that, without a Retry-After header, it should act as it should with a regular 500 error (“server encountered an unexpected condition which prevented it from fulfilling the request”).

What FlashGet does though is re-attempts the download every 3 seconds – whether or not the Retry-After header is present! Every 3 seconds to me seems a bit unnecessarily aggressive, but failing to respect the Retry-After header is the real problem here, as there’s nothing server administrators can do to reduce the number of excess attempts (short of blocking this User-Agent completely – which, realistically, probably isn’t that bad an idea).

This means that, if you’re downloading using FlashGet and you’re using the defaults, the whole time you’re downloading from a server that is using 503s to try to block you from making too many connections, you’re also spamming the server with requests for files every 3 seconds. The default (as of version 1.9.6.1073) is to try to create 5 connections at once.

As a user, you probably don’t give a shit, but it’s a real pain in the ass for people running servers, as it means log files quickly fill up with thousands upon thousands of these attempts over the course of a single download from a single user. Start adding in thousands of users and you quickly end up with a really annoying situation.

What FlashGet should do:

Assuming that I’m right about the above (which I’m relatively confident about after some testing, though not 100%; it’s certainly possible I screwed up or missed something), here’s some changes I’d like to see (in increasing order of importance):

1) Change their default User-Agent to identify as FlashGet.
2) Change the default behaviour on 503s to wait longer than 3 seconds. I think 60 seconds is a reasonable “bare minimum”, though I would say the longer the better.
3) Make it respect the Retry-After header. This is super-important.

I have posted this as a suggestion on the official FlashGet forum.

Chrome v1.x Won’t Download Large Files; Fixed in Beta 2.x

A few people still seem unaware of this so just to clarify, Google Chrome (at least the current release build at v1.0.154.53) does not support downloading of files larger than 2^31 bytes (2,147,483,648 bytes, or about 2.1 gigabytes). If you try, you’ll get an error like this:

The webpage at http://syd1.ausgamers.com:88/downloads/1239230399/AOC_US_TRIAL_CLIENT-2-of-7.zip might be temporarily down or it may have moved permanently to a new web address.

If you click the “more information on this error” link, you’ll get this slightly obscure (but still understandable) error message: “Error 8 (net::ERR_FILE_TOO_BIG): Unknown error.” (I tested with this file, which is slightly more than 2.1gb.)

This is a fairly typical problem in a lot of software that does HTTP downloads, but it is very surprising to see such a simple bug in Chrome.

However, this appears to be fixed in the 2.x beta, so if you’re using Chrome and must have the ability to download large files within it, give that a try.

Screenshot a Web Page from the Command Line

f you ever need to take a screenshot of a website then CutyCapt is probably worth a gander:

CutyCapt is a small cross-platform command-line utility to capture WebKit’s rendering of a web page into a variety of vector and bitmap formats, including SVG, PDF, PS, PNG, JPEG, TIFF, GIF, and BMP.

Doesn’t seem to be able to pick up Flash objects though and save them (which makes sense, as it’s just a simple renderer based on WebKit), but it’s still pretty handy.

Possible New MSN Virus/Trojan/Phishing Attempt

In the middle of a conversation with someone on MSN Live Messenger just now, I got the following URL as a line of text (literally in the middle of the conversation, so the other party had just said something and then this appeared):

NOTE – DO NOT CLICK THIS LINK OR DOWNLOAD THIS FILE:

http://194.0.252.210/SMSZilla.exe

I checked with the other party and they informed me they did not type that and had no idea what it was. Normally in this situation I assume their system is infected, trojaned or otherwise backdoored – although this is a brand new install of Windows less than a few hours old and with very little software installed, so it would be odd.

AVG doesn’t think this SMSZilla.exe file is anything weird (yet). I can’t find its md5 hash anywhere ( 37f13208d63710f88ec66ae0ca2c2c82 ) either.

Edit: after some more testing I saw it again – it actually takes something the other person says and converts their message into this URL (so obviously you never get their original message, just this converted one).

Update: A few hours later the message has changed and it is now sending the following URL:

http://smsfree.us/SMSZilla.Full.exe

The file is different too, the new md5 hash is 211bc2e12563efc7ddc8b04f233da3c9.

This post exists just in case anyone else is searching for the file or hash.

7-Zip / 7z Command Line Compression Method Options

I always forget how to do this and always end up battling Google trying to find the reference for it. This page is a really good reference of all the command line options for 7-Zip, but just for my purposes, here’s how to set the compression method:

-mx0 : No compression (copy mode)
-mx1 : Very low (fastest mode)
-mx3 : Fast compression mode
-mx5 : normal compression
-mx7 : maximum compression.
-mx9 : ultra compression.

How many emails do you send a day?

After reading this thread about the limit BigPond imposes on sending email on regular user accounts (apparently “it’s no more than 25 electronic messages in a period of 10 consecutive minutes and no more than 100 recipients per single message”) I got interested in how much email I send. I certainly feel like I spend all day reading and writing emails, but when I actually crunched the numbers for the last month, I was surprised about how low the volume was – only 286 emails in 30 days.

If anyone cares, this data came from hackily parsing the Outlook Express smtp.log file (which you can enable by going into Tools, then select Options, hit the Maintenance tab, and check the “Mail” box at the bottom under troubleshooting. The log file will end up wherever your .dbx files live – if you can’t find it just search your system for smtp.log.)