yum-cron fails to run on EC2 nano instances

Because I hate doing things the easy way, I often try to set up what I think are basic Linux services on boxes with very little memory. This almost always ends up in (my) tears.

In this case, I set up an EC2 t3.nano instance to serve as a legacy FTP server for an integration with a client ERP system. The server has been working perfectly in every single way – except for some reason, yum-cron wouldn’t run.

In /var/log/cron I’d see things like this:

anacron[16639]: Job `cron.daily' locked by another anacron - skipping

There were no other errors I could find anywhere; /var/log/yum just didn’t have any new lines or additions.

I spent a couple days wondering if the cron/anacron setup was correct – I couldn’t see any obvious problems, so eventually just tried invoking it from the console.

Running /usr/sbin/yum-cron manually, it just sat there for a few seconds before reporting ‘Killed’. What was killing it?!

Obviously (in retrospect) it was OOM Killer. Looking in dmesg revealed the following:

[90980.699432] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=yum-cron,pid=7002,uid=0
[90980.708766] Out of memory: Killed process 7002 (yum-cron) total-vm:756616kB, anon-rss:326396kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1160kB oom_score_adj:0

So it turns out yum-cron – which from what I can tell is a relatively simple Python script wrapper around yum – simply exhausts the memory on a t3.nano, with its paltry 512MB and cannot complete before OOM Killer wipes it out.

Adding a simple 1GB swapfile on the server fixes the problem, although the docs recommend not doing this on the EBS storage directly & instead doing it on ‘ephemeral storage instance store volumes’; this comment indicates that it might be slower and/or more expensive due to increased IO, but it seems that t3 instances do not get free access to the instance store.

Looking at the additional complexity of doing swap ‘properly’ on AWS makes it seem pretty compelling, though it basically doubles the cost.

What actually happens when you migrate Google Workspace email to another user

On occasion, you may want to migrate someone’s Google Workspace account to another user – for example, if you have a team member leave and you want to preserve their email.

Google Workspace has a Data Migration service that lets you migrate the data from account to account – although they specifically say that you shouldn’t use it in the above scenario:

If users leave your organization but you want to continue retaining or holding their data with Google Vault, we recommend you assign the users Archived User licenses rather than import their data using the data migration service. For details, go to Preserve data for users who leave your organization.

Data Migration FAQ – “Can I migrate the email data of users leaving my organization to another account?”

Basically, they recommend taking advantage of Archived Users, which is a separate user status that is only available on Business Plus plans or above, and costs AUD$5.20 per user. Might make sense if you’re already on Business Plus, but it’s an expensive change otherwise.

In any case, the migration seems to work fine; it’s relatively easy to use, although with a couple of quick caveats:

  • Aside from the FAQ noting that no labels are applied to migrated email, the docs don’t make it clear what happens when you do the migration – where does the mail end up?
    • The short answer is the mail is just all munged in together with your existing mail.
    • However, all the email will be imported in an Archived state. It preserves read/unread and other labels, but it won’t show up in your inbox.
    • As far as I can tell, the mail is not modified in any way – no extra headers are added or anything, so there’s no obvious way to identify newly imported mail.
    • SO BEWARE: if you import someone else’s mailbox into your own as a test, and then decide you don’t want it there, you’ll be dumpster-diving to clear it all out later. I would recommend first moving all the email in the source account into its own top-level folder, so it gets imported in neatly (though I’m not sure how to do that easily).
  • If you pick the default migration source as a Google Workspace account and you have 2FA set up on the source account, it will fail and tell you to select it as a Gmail account. You’ll then need to follow an OAuth-esque flow to authorise access to the account, pretty much as you’d expect. Not really a problem, just a little annoying when you go through the Workspace flow because it seems to be the obvious way to go, only to have to start again.

Tracking email bounces from AWS Cognito sent by SES

I’ve recently been debugging some issues with customers not receiving account signup emails from the Explorate platform. As is usual in cases like this, these are frustrating to try to diagnose as there are so many points where things can go wrong:

  • was the request to send email triggered properly by our software?
  • once triggered, was the request to send the email successfully received by the email sending system?
  • once successfully received, was the email actually sent?
  • once sent, was it received by the sending mail server, or bounced, or temporarily delayed?
  • once received, was it delivered to the user’s mailbox, or eaten silently by the mail server, or sent through some other internal mail approval process run by the remote server’s IT team, or any other number of weird things?
  • once delivered to the user’s mailbox, did it end up in their actual inbox, or was it filtered into spam or another folder by a local rule?

One of the challenges with software systems that send email is catching some of the error conditions that occur between the servers. A lot of default behaviour seems to be to just ignore a lot of mail errors, especially bounces – if the user doesn’t get the email who cares? But catching bounces turns out to be really useful in a lot of cases.

With AWS Cognito, however, there doesn’t appear to be a simple way through the console to configure it so you can manage bounces, at least if you’re sending with SES.

However, the functionality does exist – you just need to activate it via the CLI (or using some other API).

At its core, the issue is:

  • By default, your SES configuration will not have a Configuration Set set up, which is needed to specify how you want to handled bounces & other mail events.
  • There is no interface in the AWS Cognito User Pools config to specify which Configuration Set you want to apply for emails sent from Cognito.

It’s a pretty simple fix but it requires that you have the AWS CLI installed and set up.

WARNING: Making this change seems to reset several other configuration options in the User Pool!

The fields that unexpectedly changed for me as a result of this update were:

– MFA & verifications: Email verification seemed to be disabled & switched to ‘no verification’ (AutoVerifiedAttributes in the JSON diff).
– Message customizations: email verification message template & user invitation message template were both erased.
– Devices: “Do you want to remember your user’s devices” was set to No.

As a result, I strongly recommend that you make a snapshot of your User Pool configuration JSON before and after so that you can diff them and be aware of any other changes.

(This is apparently intended behavior; you need to provide all the various parameters otherwise stuff will reset to default.)

  1. Go into SES and create the Configuration Set in the appropriate region. Note that I think by default (possibly for everyone?), Cognito is sending from us-west-2 (Oregon), so you may need to switch to this region.

    I recommend checking the following options at the start while testing: Send Reject Delivery Bounce Complaint, but customise as you see fit.
  2. Set up the appropriate notification endpoint. Our mail volume is currently low so we just set it up for SNS delivering email, but if you have high volume and/or plenty of time you will want to send up something more sophisticated so (for example) the bounces can be reported directly into your application.
  3. Apply the Configuration Set to the relevant Cognito user pool:
    1. List all the user pools to find the ID:
      aws cognito-idp list-user-pools --max-results 10

      Output will be something like:
      {     "UserPools": [         {             "Id": "uat-pool",             "Name": "uat",             "LambdaConfig": {},             "LastModifiedDate": "2021-05-27T10:56:53.538000+10:00",             "CreationDate": "2018-06-27T09:40:55.778000+10:00"         },         {             "Id": "prod-pool",             "Name": "prod",             "LambdaConfig": {},             "LastModifiedDate": "2021-10-11T14:48:49.524000+10:00",             "CreationDate": "2021-09-27T14:32:51.703000+10:00"         },     ] } 
    2. Dump the pool’s details to view and confirm it’s the right one, particularly in the EmailConfiguration section – by default there should be no ConfigurationSet set. As noted in the above warning, I strongly recommend dumping this config to a file for comparison later.

      aws cognito-idp describe-user-pool --user-pool-id ap-uat-pool > uat-pool-current-settings.json

      The EmailConfiguration section will look something like this, with your SES ARN and the From address. The notable missing thing is the ConfigurationSet.

      { ... "EmailConfiguration": { "SourceArn": "arn:aws:ses:us-west-2:18941781714:identity/accounts@example.com", "EmailSendingAccount": "DEVELOPER", "From": "ExampleCorp <accounts@example.com>", }, ... }
    3. Update the user pool with the Configuration Set name you created in Step 1. Something like:

      aws cognito-idp update-user-pool --user-pool-id uat-pool --email-configuration="SourceArn=arn:aws:ses:us-west-2:18941781714:identity/accounts@example.com,EmailSendingAccount=DEVELOPER,From=Explorate <accounts@example.com>,ConfigurationSet=SESConfSet"
    4. Dump the pool details again and diff the two files to compare differences. As noted in the warning above, you may find some values have changed that will need to be reset.

      aws cognito-idp describe-user-pool --user-pool-id ap-uat-pool > NEW-uat-pool-current-settings.json
    5. All done. It should be good to test immediately. If you set up SNS email notification, you should now be able to trigger an email from Cognito:
      – if you have Delivery checked in your Configuration Set, you can create a new user and you should get the Delivery notification setting
      – if you have bounce checked, you can create a new user at a known bad email and you should see the bounce notification.

Postgres Query Queries

I’m using Postgres in a production capacity for the first time and have been excited to get my hands on it after decades of MySQL and being yelled at by other nerds for not using Postgres.

Doing some maintenance on a large-ish (~7m rows) database, I was somewhat disappointed by how the default optimiser doesn’t seem to do much with relatively basic sub-queries. I had always thought one of the big weaknesses of MySQL was terrible sub-query performance, and in my head, it was one of the strengths of Postgres.

Example:

SELECT * FROM logs WHERE uuid IN (SELECT uuid FROM logs LIMIT 1)

There is no index/primary key on uuid. This query is ridiculously slow. The optimiser obviously does not magically figure out it can do the sub-query first and just operate on the results.

EXPLAIN says:

  1. "Hash Semi Join (cost=0.13..788186.68 rows=1 width=644)"
  2. " Hash Cond: ((logs.uuid)::text = (logs_1.uuid)::text)"
  3. " -> Seq Scan on logs (cost=0.00..769721.60 rows=7034260 width=644)"
  4. " -> Hash (cost=0.12..0.12 rows=1 width=37)"
  5. " -> Limit (cost=0.00..0.11 rows=1 width=37)"
  6. " -> Seq Scan on logs logs_1 (cost=0.00..769721.60 rows=7034260 width=37)"

Using Common Table Expressions, it is very fast:

WITH logs AS (SELECT uuid FROM logs LIMIT 10)
SELECT * from logs

EXPLAIN says:

  1. Limit (cost=0.00..1.09 rows=10 width=37)
  2. -> Seq Scan on logs (cost=0.00..769721.60 rows=7034260 width=37)

I am sure this is Postgres 101 stuff, but am just mildly disappointed that such a seemingly basic query doesn’t magically Just Work.

Netgear DM200 Broadband Utilisation Reporting with Prometheus

Was debugging Internet performance issues in an office environment recently which was running a Netgear DM200 router – a cheap, slightly dated but still kind of neat little modem that runs BusyBox.

You can ssh to it and run a couple of scripts on it to monitor utilisation.

After some messing around I ended up with this gist, which contains:

  1. a simple shell script one-liner that runs every 10 seconds to pull the interface stats and send them to a PHP script endpoint
  2. the PHP script that collects the stats (just writes them to a file on disk, so requires write permission to data.txt in the same directory as the script)
  3. a super basic PHP script to use as a Prometheus endpoint to scrape for data

wget from Google Drive

A quick bash script to download files from Google Drive with wget.

e.g. save as gget.sh and run like:

./gget.sh 461ZQWJ2y1-zwQbzLToQsKNsLWaLwazfB outputFilename.zip
#!/bin/bash
# https://drive.google.com/file/d/[GDRIVE FILE ID]/view?usp=sharing
fileid=$1
filename=$2
wget --save-cookies=./cookie -L "https://drive.google.com/uc?export=download&id=${fileid}" > /dev/null
wget --load-cookies=./cookie "https://drive.google.com/uc?export=download&confirm=awk '/download/ {print $NF}' ./cookie&id=${fileid}" -O ${filename}

Resolving WordPress “Post-processing of the image failed” error when uploading large images

I just got the following error trying to upload an image that was 2000 pixels wide:

“Post-processing of the image failed likely because the server is busy or does not have enough resources. Uploading a smaller image may help. Suggested maximum size is 2500 pixels.”

This was confusing because the image was clearly smaller than the maximum size in the error message.

Web search for the error reveals a million different possible causes – this page has a few of the obvious ones, but none of them were my problem.

I tried increasingly slightly smaller versions of the file, then quickly realised what the issue was when it started working once the file size got below 1MB – upload limit in PHP was set to 8MB, and this was being reported in WordPress, but it was not set properly in nginx.

If I’d thought to check the nginx error log sooner I would have seen the following error:

... client intended to send too large body: 1191539 bytes

The fix is of course to just add the appropriate directive in nginx config file to align the upload size with what you have in PHP. Something like:

http {
…
    client_max_body_size 8M;
}

Using Google Colaboratory to wget Files Directly into Google Drive from Anywhere (Dropbox, AWS, VPSs, etc)

I often find myself in the situation where I have large files on various online services like Dropbox or various VPSs that I want to backup or otherwise store in Google Drive.

It seems stupid to have to download them to my desktop and then re-upload them. Good news! There’s a way to download straight into Google Drive, using Google Colaboratory.

I stole this trick from somewhere and I use it all the time to upload files directly into Google Drive.

Basically:

  1. Create a new Colab notebook
  2. Add a new cell with the following contents:
    from google.colab import drive
    drive.mount('/content/drive')
  3. Run the code. It will prompt you to authenticate this new ‘application’ against Google Drive so it can access your files.
  4. Create a new cell with the following contents:
    !wget -P /content/drive/My\ Drive https://example.com/test.zip
  5. Run the code. It will download the file, using wget, directly from the requested URL, showing you the live progress.
  6. For bonus points, you can also check the hash of the file once it has downloaded to make sure it has the contents you’re expecting (note here that you can also reference ‘My Drive’ like ‘MyDrive’?):
    !sha1sum /content/drive/MyDrive/test.zip

WordPress SSO with Google Workspace SAML and OneLogin SAML SSO Plugin

I recently embarked on a journey to simplify logins to a client’s network of WordPress sites with Single Sign-On (SSO). I didn’t really care what identity provider was going to be used, but they also used Google Workspace which I knew had support for SAML. I also figured that, being Google Workspace, the support for it would be well established, with lots of nice and simple clicky interfaces.

Nope.

Until this point my only real experience with SSO had been as a consumer; I’ve had many accounts that have implemented some form of SSO (often based on Google accounts) and it has been relatively seamless. But trying to implement it myself from a cold start I found frustrating.

I first tried auth0’s setup with their WordPress plugin, but the auth0 WordPress guide was a little out-of-date. I found the auth0 interface super overwhelming at a glance, and gave up quickly.

Some more searching put me on to the OneLogin SAML SSO plugin (GitHub), which has zero documentation on the WordPress plugins site, in their GitHub, or anywhere else that I can find. The plugin, once installed in WordPress, yields a settings page with a billion different options. While they are mostly well-described, it’s not super-clear what you need to do in order to get up and running, especially with Google Workspace.

Some trial and error and help from this support thread and it was working pretty quickly – although I still don’t know what is going on under the hood, so it will be a while before I decide to use this in any production capacity – so just wanted to document the process as it stands for WordPress and Google Workspace (as of March 2021, anyway).

There are two sides you need to configure to make this work – OneLogin’s WordPress plugin, and the Google Workspace SAML setup.

Starting with the Workspace side:

  1. Broadly, we’re following Google’s own instructions: “Set up your own custom SAML application“.
  2. Log into Workspace Admin, go to Apps, and select “SAML Apps”.
  3. Open the “Add App” dropdown and select “Add custom SAML app”.
  4. Enter whatever for the app name & click “Continue”.
  5. Copy the “SSO URL”, “Entity ID” and “Certificate” fields, taking care to get it all and preserve formatting. (You can download the IdP metadata as well for backup purposes, but you can retrieve this information again easily at any time, so don’t stress.) Click “Continue”.
  6. Now it will ask for your “Service provider details” – “ACS URL” and “Entity ID”. The Learn More link here provides no useful information about what these are or where to get them from – but they come from your WordPress setup.

So now we switch to the WordPress side:

  1. In a new tab/browser, log into your WordPress admin panel and install the OneLogin SAML SSO plugin, and activate it.
  2. Go to Settings->SSO/SAML Settings, which is where this plugin keeps its settings.
  3. At the very top of the page, there is a link: “Go to the metadata of this SP”. Clicking this will open an XML document which has the information needed for the Google Workspaces form.
  4. Two two values we want are as follows (note: both of these values seem to be able to be customised elsewhere in the OneLogin plugin settings):
    1. ACS URL: this is in the tag that looks like this: <md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" Location="https://example.com/wp-login.php?saml_acs" index="1"/> – we want the value in the Location field.
    2. Entity ID: this is in the very first tag that looks like: <md:EntityDescriptor xmlns:md="urn:oasis:names:tc:SAML:2.0:metadata" validUntil="2021-03-04T00:22:23Z" cacheDuration="PT604800S" entityID="php-saml">. Default seems to be php-saml.
  5. You can ignore the rest of the fields in the “Name ID” section & just click Continue.
  6. Now we need to configure the Attributes. Basically just replicate the below screenshot (Primary email -> email, First name -> firstname, Last name -> lastname, Primary email -> username).

  7. Click ‘Save’.

Now, back to the WordPress SSO config:

  1. Set “IdP Entity Id” to be the “Entity ID” field that we copied from the Google settings up earlier on.
  2. Set “Single Sign On Service Url” to be the “SSO URL” field.
  3. Set “X.509 Certificate” to have the certificate from the “Certificate” field.
  4. Look for the “Create user if not exists” field. Whether or not you want this checked depends on whether you already have your user accounts set up. It may be easiest, if you’re just trying to get this working at all, to check this and try with an account that doesn’t already exist in WordPress with the same email address.
  5. Look for “Match WordPress account by” and change this to “E-mail”. Google Workspace does not appear to expose any username field (maybe you can make this work with mapping but not sure).
  6. Scroll down to “ATTRIBUTE MAPPING”. As with the Google Workspace-side mapping, we do the same here:
    Username: username
    E-mail: email
    First Name: firstname
    Last Name: lastname
  7. There are tons of other things that you should look at – for example, “Prevent reset password” might be something you want to do to make sure a user can’t accidentally have their WordPress password reset to bring it out of sync with their Workspace account (I suspect in theory this should not impact things as users should not be able to login without going through the SSO, but in case of WordPress bugs or vulnerabilities in plugins or whatever it’s probably safer).
  8. Once you’re ready, scroll back up to the top and check the “Enable” checkbox.
  9. I strongly recommend opening a new private browser session and logging in as admin at this point, just in case any of this blows up access to your admin section.
  10. Then back to the bottom, hold on to your butts, and click “Save Changes”.
  11. You should now be able to log into your WordPress site with your Google Workspace credentials.

This document is a work-in-progress as I figure out more about what is going on; very interested in comments and feedback.

Other notes:

  • Single Logout (SLO) does not seem to be supported by Google Workspace at the moment so there is no easy way to log out of all services at once [that I can see].

Resources:

Copying Google Sheets into Shared Drive Removes Ability to Create Project Script Properties

If you have a Google Sheet with some associated Apps Script code in your personal drive, then you copy it into a shared drive (e.g., your company share), it will strip the project properties, including script properties that you may have (e.g.) set to hold credentials.

This seems like a good plan, to prevent information leaking when scripts are copied around.

Unfortunately, it also seems to permanently remove the ability to add script properties ever again. If you open the Project properties and check the Script properties, the “Add row” link is simply missing.

If you create a new spreadsheet on the shared drive, it will work fine – so in some cases, it might be easy to just copy/paste the spreadsheet info and the app script into an entirely new document, created from scratch.

Unfortunately this didn’t work for my spreadsheet, which had a lot of names ranges, buttons, images, and drawings, which don’t lend themselves to copy/pasting easily.

Been battling this for a while and can’t find a simple solution, although other people seem to have the same problem. There’s also several bugs in Google’s Issue Tracker that seem to be related.

The easiest fix I’ve found is:

  1. Create a new copy of the spreadsheet in my own drive
  2. Delete the scripting from the spreadsheet
  3. Copy the spreadsheet over into the shared drive
  4. Re-add the scripting into the spreadsheet
  5. Add the project properties you need

This preserves all the spreadsheet bits that are frustrating to recreate manually and (for me at least) re-creating the script is a simple copy/paste.