Downloading a Google Takeout file with curl

I had to download a large (~60GB) Google Takeout file today; asking Google to split the file into chunks of 10GB resulted in this:

I tried to download the file twice in the browser; both times it completed and then vanished from my disk drive. Then I was told I couldn’t download it again. So I had to create an entirely new Takeout.

Needless to say, this was frustrating. Copying the URL and pasting it into wget or curl doesn’t work. There are a bunch of now seemingly useless blog posts and Stack Overflow posts that imply it should work, but I couldn’t get any of them to work.

After some mucking around, what did work for me, as of today’s date, was (in Chrome):

  1. Prepare the Takeout & go through it until you get to the ‘Download data’ image shown above.
  2. Start the download.
  3. Go to the downloads tab and copy the URL there.
  4. Stop the download.
  5. Go back to the Takeout page, open devtools, and refresh it.
  6. Find the first URL to load (the base page). It looked something like this for me:
    https://takeout.google.com/manage?user=xxxxxx&rapt=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  7. Right click that URL and ‘Copy cURL’ for your appropriate OS
  8. Paste that into a notepad or whatever and add a new line (don’t forget to add a new line marker at the end of the previous line – \ on Linux or ^ in Windows.
    -o All mail including Spam and Trash-002.mbox
  9. Paste that into your terminal/cmd/shell & run the curl command.

Postgres Query Queries

I’m using Postgres in a production capacity for the first time and have been excited to get my hands on it after decades of MySQL and being yelled at by other nerds for not using Postgres.

Doing some maintenance on a large-ish (~7m rows) database, I was somewhat disappointed by how the default optimiser doesn’t seem to do much with relatively basic sub-queries. I had always thought one of the big weaknesses of MySQL was terrible sub-query performance, and in my head, it was one of the strengths of Postgres.

Example:

SELECT * FROM logs WHERE uuid IN (SELECT uuid FROM logs LIMIT 1)

There is no index/primary key on uuid. This query is ridiculously slow. The optimiser obviously does not magically figure out it can do the sub-query first and just operate on the results.

EXPLAIN says:

  1. "Hash Semi Join (cost=0.13..788186.68 rows=1 width=644)"
  2. " Hash Cond: ((logs.uuid)::text = (logs_1.uuid)::text)"
  3. " -> Seq Scan on logs (cost=0.00..769721.60 rows=7034260 width=644)"
  4. " -> Hash (cost=0.12..0.12 rows=1 width=37)"
  5. " -> Limit (cost=0.00..0.11 rows=1 width=37)"
  6. " -> Seq Scan on logs logs_1 (cost=0.00..769721.60 rows=7034260 width=37)"

Using Common Table Expressions, it is very fast:

WITH logs AS (SELECT uuid FROM logs LIMIT 10)
SELECT * from logs

EXPLAIN says:

  1. Limit (cost=0.00..1.09 rows=10 width=37)
  2. -> Seq Scan on logs (cost=0.00..769721.60 rows=7034260 width=37)

I am sure this is Postgres 101 stuff, but am just mildly disappointed that such a seemingly basic query doesn’t magically Just Work.

Launchy Alternatives

Launchy is an open source keystroke launcher for Windows. You press a magic key combo (e.g., Win-A, in my case) and a little dialog pops up, in which you can type commands and have them run.

It is very similar to hitting the Windows key in Windows 8.1 or 10 and just typing – it searches through a local database of your files and applications so you can quickly find things and launch or open them.

I am four months into my first Windows 10 machine. Unfortunately, Launchy hasn’t been updated for a while and doesn’t work as gracefully in Windows 10 as I would like:

  • It doesn’t deal gracefully with high-resolution, 4K displays. Fixable with some tweaks though.
  • Starting applications on high resolution displays seems to load them in a weird mode where they are blurry. Probably workaround-able.
  • Win-A is bound to the notification panel on the right hand side, for example, and it doesn’t really deal gracefully with very high resolution displays.

As a result, I have been forcing myself to use the native Windows 10 search, but I find it totally inferior to Launchy in many ways.

Most critically for me, Launchy has some basic tracking on how often you open particular things after you’ve started typing. So it learns, quickly and effectively, what you want to open after you’ve just typed a few letters. Contrast this to Windows 10, where to open my password safe, I have to type pwsafe.exe every single time. (I have to add the `.exe` otherwise it will open the directory in Explorer).

The usability advantages that this confers don’t sound like a big deal, but after something like seven+ years of running Launchy, I find it impossible to go back.

Anyway, I’ve been going through the various other open source alternatives to Launchy that have popped up in the last few years, and thought I’d just jot them down so the next time I go looking, I have a handy list of them.

  • LaunchyQt, a promising-looking fork of the base Launchy, with some more modern features.
  • Wox, probably my next pick. Looks pretty solid.
  • Hain, since discontinued. Built with Electron so probably less lean than some alternatives.

There are some non-open source ones as well that might be worth a look if one is less fussy than I am – Keypirinha, Listary, and Executor.

Can’t Remove Specific Phone Number from Android Contacts

Just had a weird issue where I had an Android contact with an old number on the phone, but it didn’t show up in the web contacts version. Took me a while to realise that you can select ‘View linked contacts’ from the menu in the contact view page in the Android Contacts application to see where external numbers are coming from.

In this case, it was from Signal – Signal maintains its own database of contacts. Turns out zombie contacts in Signal populating your Google Contacts is a common problem, but this Github issue implies there’s no easy UI fix for it in the Signal side.

A modified version of the solution posted by riyapenn worked for me, although I had no saved messages from the contact in question – I suspect if you do it might work differently, so look at the full solution.

Re-sync Signal contacts

1) Go to Android Settings or Android Contacts App > Menu
2) Choose Accounts
3) Choose Signal
4) Tap on the Menu
5) Choose Remove Account (the alert of clearing data is incorrect, your messages will not be deleted)
6) Open Signal
7) Tap on the pencil icon in the blue circle
8) Choose the Menu
9) Tap Refresh

exim4 and apache2-mpm-itk

Migrating from a very old Debian install to a newer VPS with a more recent version of Apache and the mpm-itk mod, I was having problems with sending mail using the standard PHP mail() call (first seen when the WP contact form I was using was throwing a “Sorry, email message could not be delivered” error).

exim4 log reported the following:

unable to set gid=33 or uid=0 (euid=0): forcing real = effective

This thread contained a post indicating the problem was the LimitGIDRange/LimitUIDRange options; it seems if these are not specified there are some defaults (perhaps with very low values, or perhaps it’s just that if it’s not set it will not work at all) that need to be overridden.

Defining these values in the global Apache configuration fixes it.

PHPMailer and Gmail API Mysterious ‘Could not connect’ Error on Windows

I’ve had PHPMailer happily sending email through the Gmail API (as part of a G Suite subscription) for a while now and it mysteriously stopped working yesterday (29th Sep, 2017), throwing the following output with debug enabled:

2017-09-30 11:24:52 SERVER -> CLIENT: 220 smtp.gmail.com ESMTP v2sm1805443wmf.8 - gsmtp
2017-09-30 11:24:52 CLIENT -> SERVER: EHLO trog-pc
2017-09-30 11:24:52 SERVER -> CLIENT: 250-smtp.gmail.com at your service, [86.170.8.39]
250-SIZE 35882577
250-8BITMIME
250-STARTTLS
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-CHUNKING
250 SMTPUTF8
2017-09-30 11:24:52 CLIENT -> SERVER: STARTTLS
2017-09-30 11:24:52 SERVER -> CLIENT: 220 2.0.0 Ready to start TLS
2017-09-30 11:24:52 SMTP Error: Could not connect to SMTP host.
2017-09-30 11:24:52 CLIENT -> SERVER: QUIT
2017-09-30 11:24:52 SERVER -> CLIENT: M I A ��] P *g�� 87� �*��h�!T��
[multiple line binary gibberish removed]
2017-09-30 11:24:52 SMTP ERROR: QUIT command failed: M I A ��] P *g�� 87� �*��h�!T�� [multiple line binary gibberish removed]
2017-09-30 11:24:52 SMTP connect() failed. https://github.com/PHPMailer/PHPMailer/wiki/Troubleshooting
Mailer Error: SMTP connect() failed. https://github.com/PHPMailer/PHPMailer/wiki/Troubleshooting

It looks pretty clearly like a crypto error and the step in the Troubleshooting guide (helpfully provided in the error message!) relating to the OpenSSL check made it seem pretty clear that it was a problem.

The OpenSSL test result looked like this:

C:\files\Apps\OpenSSL>openssl s_client -starttls smtp -crlf -connect smtp.gmail.com:587
CONNECTED(0000019C)
depth=1 C = US, O = Google Trust Services, CN = Google Internet Authority G3
verify error:num=20:unable to get local issuer certificate
---
Certificate chain
0 s:/C=US/ST=California/L=Mountain View/O=Google Inc/CN=smtp.gmail.com
i:/C=US/O=Google Trust Services/CN=Google Internet Authority G3
1 s:/C=US/O=Google Trust Services/CN=Google Internet Authority G3
i:/OU=GlobalSign Root CA - R2/O=GlobalSign/CN=GlobalSign
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIEjDCCA3SgAwIBAgIIa79pDvQYxx0wDQYJKoZIhvcNAQELBQAwVDELMAkGA1UE
BhMCVVMxHjAcBgNVBAoTFUdvb2dsZSBUcnVzdCBTZXJ2aWNlczElMCMGA1UEAxMc
R29vZ2xlIEludGVybmV0IEF1dGhvcml0eSBHMzAeFw0xNzA5MTMxNzUyMjVaFw0x
NzEyMDYxNzExMDBaMGgxCzAJBgNVBAYTAlVTMRMwEQYDVQQIDApDYWxpZm9ybmlh
MRYwFAYDVQQHDA1Nb3VudGFpbiBWaWV3MRMwEQYDVQQKDApHb29nbGUgSW5jMRcw
FQYDVQQDDA5zbXRwLmdtYWlsLmNvbTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCC
AQoCggEBAJBSmxMU1SnUDEN7W9D97mjMOMhbrDPqocU1iVgdUaUuLDDvabwb1PjE
SoSpWZpva+13ZcWnHTvw+oYA+y8yjKALGQopDXjSA9OKu2TdK4gLg05PgCvFHgXd
dOJUkkyctAfthO4oll1/NhFa5w+Juv0p5UI7aSLklFxjH7B4Iv8C85vGa5YPftkt
VOCj0rWHOXeZF14qFsWYcH8azRZKU4ih9S8IoeSgfnMhxJRnjexwVNA5MJ/ig3zT
d0q7pK8usd+rbABshztIKA2AAB0g6NueOVGI4GCT/r8MAKbezi6I5U2Kw5Ja0RJl
e0HTq+dCSc9z5Yb86/+yOEDLy3h2BV8CAwEAAaOCAUwwggFIMB0GA1UdJQQWMBQG
CCsGAQUFBwMBBggrBgEFBQcDAjAZBgNVHREEEjAQgg5zbXRwLmdtYWlsLmNvbTBo
BggrBgEFBQcBAQRcMFowLQYIKwYBBQUHMAKGIWh0dHA6Ly9wa2kuZ29vZy9nc3Iy
L0dUU0dJQUczLmNydDApBggrBgEFBQcwAYYdaHR0cDovL29jc3AucGtpLmdvb2cv
R1RTR0lBRzMwHQYDVR0OBBYEFD3FXTx/EAB0m7BLwZC8B7Bh/+TUMAwGA1UdEwEB
/wQCMAAwHwYDVR0jBBgwFoAUd8K4UJpndnaxLcKG0IOgfqZ+ukswIQYDVR0gBBow
GDAMBgorBgEEAdZ5AgUDMAgGBmeBDAECAjAxBgNVHR8EKjAoMCagJKAihiBodHRw
Oi8vY3JsLnBraS5nb29nL0dUU0dJQUczLmNybDANBgkqhkiG9w0BAQsFAAOCAQEA
DWhdK0rwWV7Q2lBk2oukJBgmptwWsPHtkYjnhjqRXzAwAg6iqXJrf6BUmdgK4Vvp
rj0qeE9kcTudvZwMPVxS7gcjk66v79n2NvE2QBKGZnlUC/4S93jgQVuDZMwmKF3n
ArjDQ0zE2o6wfM3I3yNkzT+/ZxXtrYzhPmRmbVKWTgMSJvWwN6H2T7An+1JXl11A
7Tf5VeiPSI/kvCByw7sezFDRHbnj2uXZz23DymT75zgF/V3Nbzmg3htdlVnyB2Xp
kMKl5swPBrBuui2+et9ZN7vYjZRdHy0jg/PB9lfpdA2CQnIHcq/vIYBzmi+TSms1
vNaIty8ekNsvigjIzn13eg==
-----END CERTIFICATE-----
subject=/C=US/ST=California/L=Mountain View/O=Google Inc/CN=smtp.gmail.com
issuer=/C=US/O=Google Trust Services/CN=Google Internet Authority G3
---
No client certificate CA names sent
Peer signing digest: SHA256
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 3246 bytes and written 468 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES128-GCM-SHA256
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
Protocol : TLSv1.2
Cipher : ECDHE-RSA-AES128-GCM-SHA256
Session-ID: 927522CACEB8BB3D0FB305E197235C64D147A4CC26643AB60EB5F110E787FA98
Session-ID-ctx:
Master-Key: 8D698AF5A7790DC4836430F2FA6157B310CF0DDA684B5160BEC643B966E9CCC41598D34D03DA0579893A6CAFB62C2B33
Key-Arg : None
PSK identity: None
PSK identity hint: None
SRP username: None
TLS session ticket lifetime hint: 100800 (seconds)
TLS session ticket:
0000 - 00 33 97 a3 0b be 7f 8d-47 f3 97 6c 18 bb 43 83 .3......G..l..C.
0010 - 27 a4 f7 01 2c d1 a8 0e-55 a9 3c c3 b3 6f 30 58 '...,...U.<..o0X
0020 - 89 22 e3 29 50 42 18 8e-29 ca be 27 57 f9 bc 6e .".)PB..)..'W..n
0030 - 25 f9 ed 68 6a ba 30 97-60 0b 32 fc 19 ab 83 10 %..hj.0.`.2.....
0040 - 00 d1 91 e7 1d 72 d9 2f-3f 27 ac 06 83 23 78 94 .....r./?'...#x.
0050 - 4d 59 38 7f 5d 70 2e ec-d9 d4 b3 31 c9 34 04 25 MY8.]p.....1.4.%
0060 - 79 a8 2f 49 66 ce c7 e3-67 de 46 58 43 b9 42 36 y./If...g.FXC.B6
0070 - 54 49 33 94 99 1e 7d 0b-87 4c da c5 a4 72 b1 05 TI3...}..L...r..
0080 - 5d 47 3b cf 33 13 69 41-f8 1d e4 a0 81 26 1c e5 ]G;.3.iA.....&..
0090 - a7 6b 9b 09 c8 db 1d 8f-6b 5e 54 eb d7 ed 9e 6c .k......k^T....l
00a0 - fc 1f f9 f8 3a d4 3a df-05 c7 0b a3 0b 66 c1 4e ....:.:......f.N
00b0 - 66 27 3c 64 03 60 81 1d-44 bb f0 a4 08 d0 96 dd f'<d.`..D.......
00c0 - 14 31 95 fd 23 7f 13 82-ed 15 fa fb 6a f5 ec 69 .1..#.......j..i
00d0 - c9 b1 d3 e9 fc .....

Start Time: 1506770618
Timeout : 300 (sec)
Verify return code: 20 (unable to get local issuer certificate)
---
250 SMTPUTF8

At first glance the Troubleshooting guide implies that the ‘unable to get local issuer certificate’ is safe to ignore – but it is only referring to the first instance of the error at the top. If you’re also seeing the error message at the bottom, you have the same problem as me.

The easy fix is to set verify_peer to false as it described in the Troubleshooting guide. But (also as it notes) this is dodgy and you should fix the local certificate store. So don’t do this.

After messing around a bit (including testing identical code on a Linux VM and seeing that it worked), I gritted my teeth and dove into the OpenSSL configuration (something which I’ve studiously avoided for years because everything has magically Just Worked for me).

It looks like OpenSSL didn’t have a local certificate store at all in Windows and it needs to be explicitly configured. I have no idea how it worked at all – maybe it was using some sort of embedded certificate that just expired? Or maybe I had changed some other option somewhere without realising (unlikely but I hate blaming gremlins).

Anyway, the fix is simple:

1) Download the latest cacert.pem file from the curl website

2) Plonk it somewhere on your local machine where PHP can get to it.

3) Update your php.ini’s openssl.cafile directive to point to this new file.

PHP’s OpenSSL should now have the local certificates. The OpenSSL test in the PHPMailer Troubleshooting guide should now “pass” and that final Verify return code: 20 (unable to get local issuer certificate) message should be replaced with Verify return code: 0 (ok). PHPMailer should also happily work again.

Setting File Creation Timestamp from Modified Time

Note to self: here’s how to easily reset the file creation timestamp, replacing it with the file modification date, using PowerShell on Windows:

Get-ChildItem -recurse -filter *.jpg| % { $_.CreationTime = $_.LastWriteTime }

I wanted to do this recently when a bunch of my photos had them broken by a recovery issue. Yoinked from this StackOverflow.

Fix multi-process Firefox “Disabled by accessibility tools” issue

As of Firefox v54.0, multi-process support (also known as Electrolysis) is enabled by default in Firefox.

I had it working through various config tweaks for a few releases before that, but just noticed that at some point it had been disabled.

about:support reported it was disabled, with the issue:

0/1 (Disabled by accessibility tools)

This was a little weird as I didn’t recall having any accessibility tools enabled. I checked the Options and sure enough the option “Show a touch keyboard when necessary” (under Advanced->General) was enabled.

Disabling this option and restarting did not fix it, but then I found this page which carries this note:

accessibility.lastLoadDate – time of the last load. Electrolysis will be disabled for seven days post this time.

I edited this value in about:config – simply winding it back by some random amount (I used 1491307811 but basically anything more than seven days before whatever time it is should do the trick – and restarted Firefox. accessibility.lastLoadDate had vanished from about:config but multi-process mode was enabled again.

Comparing Vision APIs

I spent a few hours this weekend tinkering with cloud-based computer vision APIs as part of a personal project to better classify my photos. I tested the Microsoft Computer Vision API and Google’s Cloud Vision API.

Both were reasonably easy to set up, although the Google one requires a bit more effort and fussing around to navigate their API control panel. Microsoft’s took probably less than 10 minutes from signup to having working code – their process is much simpler; basically register and get an API key and you’re ready. Google requires that you signup, submit payment details, download their SDK, authenticate your account through the SDK via OAuth, and then you can finally try it out.

I had somewhat lower expectations from the Microsoft one based solely on the first thing that I saw when I checked out their home page – one of the examples that is displayed by default includes errors in their OCR:

Microsoft’s API still performs fairly well, although from a quick experimentation it seems that the Google one produces more reasonable results.

Here are some examples:

Microsoft

Landmarks
	Tower of London
Labels:
	sky
	outdoor
	tree
	building
	tall
	roof
	

Google

Landmark: 
	Tower of London
	Tower of London, Jewel House
Labels:
	sky
	building
	landmark
	historic site
	medieval architecture

Google’s tags are a bit more specific than Microsoft’s, but there’s some overlap. Google correctly identifies it as the Tower of London, but incorrectly decides it is Jewel House (it is the White Tower).

Winner: Google

Microsoft

Labels:
	water
	outdoor
	sky
	building
	river
	bridge

Google

Landmark:
        Tower Bridge
Labels:
        bridge
        reflection
        body of water
        waterway
        landmark

Google correctly flags this as the Tower Bridge but almost amazingly (considering how iconic it is), Microsoft does not. Perhaps the colours or darkness are causing issues here. However, the tagging in both is pretty good.

Winner: Google

Microsoft

Labels:
	tree
	outdoor
	sky
	building
	government building
	tower

Google

Landmark:
        St. Paul's Cathedral
Labels:
        sky
        landmark
        tree
        urban area
        woody plant

Again, Google is correctly able to identify the landmark while Microsoft falls short.

Winner: Google

Microsoft

Labels:
	outdoor
	water
	sky
	night

Google

Labels:
        night
        reflection
        landmark
        cityscape
        waterway

Neither of them pick up that this is Big Ben and/or Westminster. The tags are pretty good although I feel Google has a slight advantage for calling it a cityscape.

Microsoft

Labels:
	table
	sky
	wine
	tree
	outdoor
	glass
	beverage
	drink
	alcohol

Google

Labels:
        water
        drink
        beer
        alcoholic beverage
        wine glass	

Both pick up on the alcohol theme, but Google correctly identifies it as beer – although it picks it as a wine glass, perhaps because it’s a slightly unusual shape for a beer glass. Microsoft’s tags however are much more complete.

Winner: Google

Microsoft

Labels:
	manhole cover

Google

Labels:
	circle
	manhole
	manhole cover
	black and white
	stone carving

I have a lot of photos of manhole covers and I am keen to find a way to tag them automatically. Both Google & MS correctly tag this. Google has a bunch of extra detail, although the photo is not actually black and white.

Microsoft

Labels:
	sky
	outdoor
	grass
	mountain
	person
	standing
	nature
	posing
	day
	highland

Google

Labels:
	mountainous landforms
	sky
	mountain
	nature
	cloud

Microsoft has a lot of detail here and importantly correctly identifies it as a “highland” photo. But both are pretty good.

Winner: Microsoft

Microsoft

Labels:
	fence
	tree
	outdoor
	parrot
	animal
	bird
	white

Google

Labels:
        bird
        vertebrate
        purple
        flora
        tree

This is not a parrot, Microsoft. Vertebrate is a bit generic, although it is indeed a bird. Bit of a draw but the tags are still generally useful.

Microsoft

Labels:
	sky
	outdoor
	mountain
	grass
	nature
	hill
	field
	background
	overlooking
	grassy
	hillside
	cloudy
	clouds
	highland
	land
	distance

Google

Labels:
        highland
        sky
        loch
        cloud
        wilderness

Well, Microsoft really throw the kitchen sink at this one, but they’re all accurate. Both correctly tag it with “highland” which is great, but bonus points to Google for “loch”.

Microsoft

Labels:
	person
	outdoor
	man
	standing

Google

Labels:
        photograph
        statue
        monument
        photography
        religion

I was curious to see what it would think about a statue; Google’s tags are clearly more useful than Microsoft’s here.

Winner: Google

Microsoft

Labels:
	grass
	outdoor
	sky
	tree
	building
	field
	farm
	old
	grassy
	pasture
	garden
	lush

Google

Labels:
        grass
        cemetery
        tree
        wall
        historic site	

Microsoft again throwing down as many as possible. Both pretty useful although again Google is the clear winner for picking it as a cemetery.

Winner: Google

Microsoft

Labels:
	building
	outdoor
	tower
	old
	stone

Google

Landmark:
        Broadway Tower
Labels:
        castle
        building
        sky
        tower
        fortification

Google again nail the location and also tag it as a ‘castle’, which is certainly what I would have done. Microsoft’s are OK but again a bit too general.

Microsoft

Labels:
	blurry
	rain

Google

Labels:
        insect
        bee
        honey bee
        macro photography
        membrane winged insect		

Microsoft have no idea what is going on here. Google smashes it.

Winner: Google

Microsoft

Labels:
	sky
	outdoor
	grass
	tree
	cloudy
	clouds
	day
	lush

Google

Landmark:
        Queen's House
Labels:
        cloud
        sky
        city
        daytime
        urban area	

More generally correct stuff from Microsoft, but Google nail it with Queen’s House (although if it had also picked Canary Wharf I would have been doubly impressed).

Winner: Google

On regulating encryption

I wrote a few quick thoughts about the latest aimless flailing around of the politicians of Australia and the United Kingdom as they desperately attempt to appear like they’re doing something about national security by talking about what a scary place the Internet has become.

I make no claim about being a crypto expert but I don’t believe it’s possible to accomplish what they want without either massively compromising the security of everyone by forcing companies to comply and use weaker encryption or fundamentally altering the nature of the Internet and personal computing (perhaps as described in Vernor Vinge’s Huge award-winning science fiction novel, Rainbow’s End).

Anyway, here is a short list of actions required to regulate encryption. Good luck.