trog's haus – A web page about software development, startups, open source, and other things.

URL Click Interceptors for Windows

A quick list of open source URL interceptors – bits of software that, when you click on a URL, will prompt you for which browser you want to handle the URL – for Windows:

Hurl – C#/.NET
BrowserPicker – C#/.NET
BrowserSelector – C#/.NET
BrowserSelect – C#/.NET
Browsers – Rust
BrowserBarrier – AutoHotkey

A Random Solution to 100% CPU Usage in Windows

Like a zillion people on the Internet, my laptop (a Dell XPS 7590) occasionally starts going nuts with the fans, because a CPU core is pegged at 100% for no obvious reason.

Task Manager/Resource Monitor all uselessly show it’s System, and trying to debug further in Process Explorer just leads you down a rabbit hole of potential Windows problems.

After picking a few of these at complete random I luckily eventually stumbled across something that fixed it – changing the Power Options related to USB settings. Specifically, disabling USB selective suspend setting when the system was plugged in.

This immediately made the 100% CPU pegging go away, solving a problem that had plagued me for over a year. No idea if it was related to USB devices I had plugged in – I had the same problem at work and in the office, where I have completely different equipment.

No idea what the potential impact to this is but I find it hard to believe it’s worse than my CPU spinning randomly.

AWS Transfer Family SFTP Directories Are A Bit Weird

There are still a lot of people out there with SFTP (and even FTP!) based workflows. Amazon know this and have a dedicated product called AWS Transfer Family, which is basically an amazingly expensive SFTP wrapper that lives on top of S3.

If you don’t want the hassle of running SFTP on a $5/mo virtual server, then paying AWS on the order of USD$200/mo might be a good option.

There is some slightly weird behaviour compared to standard SFTP that caught me by surprise relating to directories.

(Note: I am doing this on a client’s SFTP setup, so I don’t know what it actually looks like on the S3 side.)

If you try to rename a file into a directory that does not exist, you will not get an error – it will actually work, and create some sort of “virtual subdirectory” in the S3 bucket. e.g., if you do rename example.txt backup/example.txt, without the backup/ directory existing, and then do a directory listing, you’ll see there is a new backup/ directory that was created by that rename operation.
If you then move the file back – rename backup/example.txt ./example.txt – the backup/ directory will disappear.
If you create the backup/ directory first, and repeat the move in and out, the directory will persist.
If the backup/ directory was created by the rename command, and you then try to do an ls * on the parent directory, it will return the files in backup/ as well – i.e., it will act like a recursive ls.

If you are trying to get closer to standard SFTP-based behaviour with directories, I suspect it’s safer to manually make the directories first (as you would normally) instead of relying on this weird automatic directory creation you get from the rename.

\d vs [0-9] in JavaScript/node.js regex

I was trying to debug while a seemingly very simple regular expression in JavaScript was failing.

The goal was to catch expressions going to an API endpoint that looked like:

/endpoint/145,14,93

The regex I had was working fine in regex101.com’s simulator:

^/endpoint/\d+,\d+(?:,\d+)*$

But running under node.js, it wouldn’t work – it would catch a single digit, but not any subsequent digits.

Spent a while trying different things – mostly assuming I was doing something boneheaded due to my lack of familiarity with node.js. A colleague verified the same thing and also wasn’t sure.

I then realised it worked fine if I replaced the \d with [0-9]. I thought this was weird – the MDN documentation says:

Matches any digit (Arabic numeral). Equivalent to [0-9]. For example, /\d/ or /[0-9]/ matches “2” in “B2 is the suite number”.

… which made me assume they were the same thing.

After much websearching & the usual difficulty in finding meaningful results with search terms like “\d”, in desperation, I thought I’d ask ChatGPT, and got the following result:

The fourth point seems to be the case – the \d is also matching the comma.

I’m sure this is documented somewhere (otherwise how else would ChatGPT know about it?!) but I couldn’t find it referenced in any of the stuff that came up through common search terms.

SEO? More like SE-NO!

[Hilarious alternate title: Dealing with SEO: How to go from SE-No to SE-Oh Yeh.]

I wrote this article back in 2014 for our agency’s blog. It was never published – I suspect our sales & comms people didn’t like it as it conflicted with some of our service offerings.

While it reads a bit dated, I think the core tenets are still more or less correct, and I just thought the title was very funny, so here we are.

SEO? More like SE-NO!

Until social networking came along, search engine optimisation (SEO) was the undisputed king of web buzzwords. If you weren’t doing SEO, then you were crazy and your website – and by extension, your business – was going nowhere.

SEO now has to vie for mindshare against a social media strategy, but it has a long legacy and thus is still heavily entrenched in the minds of anyone who is trying to do anything in the online space.

If you’re running a website for your business, and you’re not a technical person familiar with the intricacies of SEO, you might have been concerned about this – how do you make your website stand out? How do you set things up so that when someone types in the name of your business or your industry, you show up on the first page?

In short – do you need SEO?

Well, vast hordes of SEO specialists, agencies, companies and consultants have sprung up over the preceding years to help answer these questions. Sounds promising, right? In an increasingly knowledge-based economy, it’s obviously helpful to have a bunch of people who have devoted themselves to becoming experts on a topic, so you can leverage their abilities. Great!

Unfortunately, things aren’t great in the world of SEO. Things are messy. Let’s have a look at why.

What is SEO?

First up – what the heck is SEO, anyway? “Search engine optimisation” sounds pretty clear-cut – everyone needs to be on search engines!. But what actually is it? When someone “does SEO”, what exactly are they doing?

The short answer is: it could be anything. “SEO” is not the sort of hard, technical term that is favoured by computer nerds like us. There’s no specification, there’s no regulations, there’s no protocols – there’s not even an FM to R.

In a nutshell, SEO means making changes to a website to improve how search engines react to it. It can be as simple as making sure you have a title on your page – for example, if your business is a coffee shop, you might want to make sure you have the words “coffee shop” somewhere in the title. It can be complicated, too – like running analyses on the text content on your site to measure keyword density.

Changes can also be external. One of the biggest things that impacts a site’s rankings in search results is how many other people on the Internet are linking to you. So one SEO strategy is to run around the Internet and make a bunch of links back to your site. (We’ll talk about this a bit more later.)

Other technical things might influence SEO as well. Google recently announced that whether or not a site used HTTPS (the secure padlock dealie that means your website is secure for credit card transactions) would start having some impact on rankings.

As we can see here, there’s a bunch of different things that can affect your SEO – and I’ve only listed a handful of them. There are more – and they all are interrelated.

As if that wasn’t complicated enough, there’s something else that affects where you end up in search results – the person who is searching. Where you are will change things – if I’m searching for coffee shops, I’m more likely to get results that are geographically closer to me. If I’ve done a lot of searches for certain terms, I’m more likely to to see results based on those terms.

If you have your own website and regularly visit it, it is possible that will affect the rankings as you see them. If you search for yourself you might see your ranking up higher than someone else doing the exact same search located in the next street – or the next town, state, or country.

What’s the practical upshot?

In short: SEO is complicated. There are lots of variables, and they are hard to control.

That’s not even the really bad part: the only people who know exactly how the search ranking system works are the search engines themselves. No matter what you do, the outcome is still 100% determined by whatever is going on with the search engines on any particular day.

No matter what you’re told, anything anyone knows about how to “do SEO” comes from one of two sources: information made publicly available by search engines, and from reverse engineering search engine behaviour by experimentation.

You might invest large amounts of time and effort (and money) in trying to execute a particular SEO strategy, only to have Google come along the next day and announce they’ve changed everything, and your investment is largely wasted.

SEO is a moving target. Things are constantly in flux, in no small part due to the huge number of people attempting to game the system by any means possible – in a world where a top ranking in a competitive search result can mean a huge increase in sales in a very short time, getting an edge is a big deal. And many slightly more nefarious techniques – usually dubbed “black hat SEO” – have emerged, which in many cases can do massive damage to your rankings.

As if all that wasn’t traumatic enough… your ranking is something that evolves over time. A new website won’t appear in search results immediately at all; it might take a few days to show up, and in most circumstances will be low in rankings. If you’re in a competitive space, it might take you months to even register on the first few pages of results.

This means it is very, very hard to do any sort of significant or reliable experiments with SEO in a short timeframe. You can’t change a few words and the instantly check how they affect your rankings. You have to wait – a long time – to see if it has any effect. During that time, there will be any number of other changes that have occurred, making it hard to confirm if your experiment worked.

Doing SEO scientifically is hard. Measuring cause and effect is hard enough in small experiments when there are few variables and they can be tightly controlled. In SEO there are many variables, constantly in flux, known only to the clever engineers that write and evolve the ranking algorithms – the secret sauce that drives how every search engine works.

I said what’s the practical upshot!

Oh, right. Well, the practical upshot is that the world of SEO providers is full of people over-promising and under-delivering.

This is the big risk of paying for SEO services. Because it’s such a vague, hand-waving term that encompasses so many different areas, there are, sadly, a number of operators in the space that use it as an opportunity to provide services that are not quantified or qualified in any meaningful way.

Because of the complexity of the systems involved, it is practically impossible to deliver a promise of results in the SEO world. You might get promised a first page search result, but it is extremely difficult to deliver this, especially in competitive spaces – if you’re trying to get your coffee shop on the first page of Google results for the term ‘coffee shop’, you’ve got a long road ahead of you.

Worse, there are black hat operators that will do things that look like a great idea in the short term, but may end up having huge negative ramifications. “Negative SEO” is one of the more recent examples.

As a result, there are plenty of witch doctors offering SEO snake oil. Promises of high rankings and lack of delivery abound – followed by “oh, we need more money” or “you need to sign up for six months to see results”.

One only needs to look at the SEO sub-forum on Whirlpool – one of the most popular communities in Australia for those seeking technical advice – to see what a train wreck the current SEO market is. At the time of writing there’s a 96 page thread at the top with unsatisfied customers of one particular agency. There are stacks of warnings about other agencies. Scroll through and have a look.

Customers of many SEO agencies are not happy, and it’s because they’re paying for something they don’t really understand without getting crystal clear deliverables.

The situation is so bad that the second sentence on Google’s own “Do you need an SEO?” page states:

Deciding to hire an SEO is a big decision that can potentially improve your site and save time, but you can also risk damage to your site and reputation.

Some other interesting terms used on that page: “unethical SEOs”, “overly aggressive marketing efforts”, “common scam”, “illicit practice”… indeed, the bulk of the document explains all the terrible things you need to watch out for when engaging an SEO.

(I should stress that this is not a general statement that encompasses all those who perform SEO. There are many smart and dedicated people out there that live on the cutting edge of search engine technology, doing only white hat work, delivering great things for their clients. The hard part is finding them in the sea of noise.)

Cool story. What does this mean for me?

Back to the original question – do you need SEO?

There’s no right answer. It’s a big question that encompasses a wide range of stuff. Without looking at your specific situation it’s hard to tell how much effort you should put into SEO at any given point in time.

Remember: there’s no clear-cut magic SEO bullet that will do exactly what you want. But one thing is for sure – someone will happily take your money.

If you decide to engage someone to help optimise your website for search, here’s a quick list of things to pay attention to:

Carefully read Google’s “Do you need an SEO?” document, paying particular attention to the dot points at the bottom.
Establish clear deliverables that you understand – you need to make sure that you know what you’re paying for, otherwise what you get will be indistinguishable from nothing.
Tie any payments (especially ones involving large amounts) to performance metrics – but don’t be surprised if they’re not interested in doing this. (What does that tell you?)
Remember that anything that is not a simple content update that you can do yourself might have other costs – for example, changing page layout or adding new tags might require you to get web developers on board.
If you’re building a new site from scratch, make sure your developers are factoring in SEO right from the outset. Almost any decent developer will be using a framework or software that takes SEO into consideration, and as long as they – and you – are paying some attention to Google’s SEO Starter Guide (EDIT: 2018 version is here: https://support.google.com/webmasters/answer/7451184?hl=en ) you’ll end up in a much better position.
Strongly consider search engine marketing (SEM) instead. SEM is the thing where you pay companies like Google money to have your website appear in search results as ads, based on specific terms. The Google programme – AdWords – gives you incredible control over when your ads appear, and you also get excellent data back from any campaigns. With AdWords you can actually effectively measure the results of your work – so you can scientifically manage your campaigns, carefully tracking how every one of your marketing dollars is performing.

“IF7-Status: blocked” preventing browser requests

Had a recent issue with requests from our front end interface being blocked & surfacing as 403 errors directly to the user in an error message within the web application.

This was weird because we were not seeing corresponding 403s coming from our backend application and couldn’t find anything that would generate a 403.

We were able to get an HAR file from the user that made it look like the 403 was actually coming from the backend – at least, in the HAR file, the endpoint that was being called, /user/me, showed up as responding with a 403 error.

After looking at the requests in more detail, we found the following response header was also (apparently) being returned by the server:

{
"name": "IF7-Status",
"value": "blocked"
},

We’d never seen this header before – it wasn’t present in our code base, including any of our dependencies. Websearch for “IF7-Status” revealed no relevant results.

So we started to wonder if it might be getting injected by some other software on the user’s computer. And of course, that’s exactly what it was – the user has some Internet filtering software installed on their PC called Streamline3 which for whatever reason had something matching this endpoint in their blocklist.

This post exists purely to put the term “IF7-Status” on the Internet in case anyone else runs into this issue.

yum-cron fails to run on EC2 nano instances

Because I hate doing things the easy way, I often try to set up what I think are basic Linux services on boxes with very little memory. This almost always ends up in (my) tears.

In this case, I set up an EC2 t3.nano instance to serve as a legacy FTP server for an integration with a client ERP system. The server has been working perfectly in every single way – except for some reason, yum-cron wouldn’t run.

In /var/log/cron I’d see things like this:

anacron[16639]: Job `cron.daily' locked by another anacron - skipping

There were no other errors I could find anywhere; /var/log/yum just didn’t have any new lines or additions.

I spent a couple days wondering if the cron/anacron setup was correct – I couldn’t see any obvious problems, so eventually just tried invoking it from the console.

Running /usr/sbin/yum-cron manually, it just sat there for a few seconds before reporting ‘Killed’. What was killing it?!

Obviously (in retrospect) it was OOM Killer. Looking in dmesg revealed the following:

[90980.699432] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=yum-cron,pid=7002,uid=0 [90980.708766] Out of memory: Killed process 7002 (yum-cron) total-vm:756616kB, anon-rss:326396kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1160kB oom_score_adj:0

So it turns out yum-cron – which from what I can tell is a relatively simple Python script wrapper around yum – simply exhausts the memory on a t3.nano, with its paltry 512MB and cannot complete before OOM Killer wipes it out.

Adding a simple 1GB swapfile on the server fixes the problem, although the docs recommend not doing this on the EBS storage directly & instead doing it on ‘ephemeral storage instance store volumes’; this comment indicates that it might be slower and/or more expensive due to increased IO, but it seems that t3 instances do not get free access to the instance store.

Looking at the additional complexity of doing swap ‘properly’ on AWS makes it seem pretty compelling, though it basically doubles the cost.

What actually happens when you migrate Google Workspace email to another user

On occasion, you may want to migrate someone’s Google Workspace account to another user – for example, if you have a team member leave and you want to preserve their email.

Google Workspace has a Data Migration service that lets you migrate the data from account to account – although they specifically say that you shouldn’t use it in the above scenario:

If users leave your organization but you want to continue retaining or holding their data with Google Vault, we recommend you assign the users Archived User licenses rather than import their data using the data migration service. For details, go to Preserve data for users who leave your organization.
Data Migration FAQ – “Can I migrate the email data of users leaving my organization to another account?”

Basically, they recommend taking advantage of Archived Users, which is a separate user status that is only available on Business Plus plans or above, and costs AUD$5.20 per user. Might make sense if you’re already on Business Plus, but it’s an expensive change otherwise.

In any case, the migration seems to work fine; it’s relatively easy to use, although with a couple of quick caveats:

Aside from the FAQ noting that no labels are applied to migrated email, the docs don’t make it clear what happens when you do the migration – where does the mail end up?
- The short answer is the mail is just all munged in together with your existing mail.
- However, all the email will be imported in an Archived state. It preserves read/unread and other labels, but it won’t show up in your inbox.
- As far as I can tell, the mail is not modified in any way – no extra headers are added or anything, so there’s no obvious way to identify newly imported mail.
- SO BEWARE: if you import someone else’s mailbox into your own as a test, and then decide you don’t want it there, you’ll be dumpster-diving to clear it all out later. I would recommend first moving all the email in the source account into its own top-level folder, so it gets imported in neatly (though I’m not sure how to do that easily).
If you pick the default migration source as a Google Workspace account and you have 2FA set up on the source account, it will fail and tell you to select it as a Gmail account. You’ll then need to follow an OAuth-esque flow to authorise access to the account, pretty much as you’d expect. Not really a problem, just a little annoying when you go through the Workspace flow because it seems to be the obvious way to go, only to have to start again.

Tracking email bounces from AWS Cognito sent by SES

I’ve recently been debugging some issues with customers not receiving account signup emails from the Explorate platform. As is usual in cases like this, these are frustrating to try to diagnose as there are so many points where things can go wrong:

was the request to send email triggered properly by our software?
once triggered, was the request to send the email successfully received by the email sending system?
once successfully received, was the email actually sent?
once sent, was it received by the sending mail server, or bounced, or temporarily delayed?
once received, was it delivered to the user’s mailbox, or eaten silently by the mail server, or sent through some other internal mail approval process run by the remote server’s IT team, or any other number of weird things?
once delivered to the user’s mailbox, did it end up in their actual inbox, or was it filtered into spam or another folder by a local rule?

One of the challenges with software systems that send email is catching some of the error conditions that occur between the servers. A lot of default behaviour seems to be to just ignore a lot of mail errors, especially bounces – if the user doesn’t get the email who cares? But catching bounces turns out to be really useful in a lot of cases.

With AWS Cognito, however, there doesn’t appear to be a simple way through the console to configure it so you can manage bounces, at least if you’re sending with SES.

However, the functionality does exist – you just need to activate it via the CLI (or using some other API).

At its core, the issue is:

By default, your SES configuration will not have a Configuration Set set up, which is needed to specify how you want to handled bounces & other mail events.
There is no interface in the AWS Cognito User Pools config to specify which Configuration Set you want to apply for emails sent from Cognito.

It’s a pretty simple fix but it requires that you have the AWS CLI installed and set up.

WARNING: Making this change seems to reset several other configuration options in the User Pool!

The fields that unexpectedly changed for me as a result of this update were:

– MFA & verifications: Email verification seemed to be disabled & switched to ‘no verification’ (AutoVerifiedAttributes in the JSON diff).
– Message customizations: email verification message template & user invitation message template were both erased.
– Devices: “Do you want to remember your user’s devices” was set to No.

As a result, I strongly recommend that you make a snapshot of your User Pool configuration JSON before and after so that you can diff them and be aware of any other changes.

(This is apparently intended behavior; you need to provide all the various parameters otherwise stuff will reset to default.)

Go into SES and create the Configuration Set in the appropriate region. Note that I think by default (possibly for everyone?), Cognito is sending from us-west-2 (Oregon), so you may need to switch to this region.

I recommend checking the following options at the start while testing: Send Reject Delivery Bounce Complaint, but customise as you see fit.
Set up the appropriate notification endpoint. Our mail volume is currently low so we just set it up for SNS delivering email, but if you have high volume and/or plenty of time you will want to send up something more sophisticated so (for example) the bounces can be reported directly into your application.
Apply the Configuration Set to the relevant Cognito user pool:
1. List all the user pools to find the ID:
  aws cognito-idp list-user-pools --max-results 10
  
  Output will be something like:
```
{     "UserPools": [         {             "Id": "uat-pool",             "Name": "uat",             "LambdaConfig": {},             "LastModifiedDate": "2021-05-27T10:56:53.538000+10:00",             "CreationDate": "2018-06-27T09:40:55.778000+10:00"         },         {             "Id": "prod-pool",             "Name": "prod",             "LambdaConfig": {},             "LastModifiedDate": "2021-10-11T14:48:49.524000+10:00",             "CreationDate": "2021-09-27T14:32:51.703000+10:00"         },     ] } 
```
2. Dump the pool’s details to view and confirm it’s the right one, particularly in the EmailConfiguration section – by default there should be no ConfigurationSet set. As noted in the above warning, I strongly recommend dumping this config to a file for comparison later.
  
  aws cognito-idp describe-user-pool --user-pool-id ap-uat-pool > uat-pool-current-settings.json
  
  The EmailConfiguration section will look something like this, with your SES ARN and the From address. The notable missing thing is the ConfigurationSet.
```
{ ... "EmailConfiguration": { "SourceArn": "arn:aws:ses:us-west-2:18941781714:identity/accounts@example.com", "EmailSendingAccount": "DEVELOPER", "From": "ExampleCorp <accounts@example.com>", }, ... }
```
3. Update the user pool with the Configuration Set name you created in Step 1. Something like:
  
  aws cognito-idp update-user-pool --user-pool-id uat-pool --email-configuration="SourceArn=arn:aws:ses:us-west-2:18941781714:identity/accounts@example.com,EmailSendingAccount=DEVELOPER,From=Explorate <accounts@example.com>,ConfigurationSet=SESConfSet"
4. Dump the pool details again and diff the two files to compare differences. As noted in the warning above, you may find some values have changed that will need to be reset.
  
  aws cognito-idp describe-user-pool --user-pool-id ap-uat-pool > NEW-uat-pool-current-settings.json
5. All done. It should be good to test immediately. If you set up SNS email notification, you should now be able to trigger an email from Cognito:
  – if you have Delivery checked in your Configuration Set, you can create a new user and you should get the Delivery notification setting
  – if you have bounce checked, you can create a new user at a known bad email and you should see the bounce notification.

Postgres Query Queries

I’m using Postgres in a production capacity for the first time and have been excited to get my hands on it after decades of MySQL and being yelled at by other nerds for not using Postgres.

Doing some maintenance on a large-ish (~7m rows) database, I was somewhat disappointed by how the default optimiser doesn’t seem to do much with relatively basic sub-queries. I had always thought one of the big weaknesses of MySQL was terrible sub-query performance, and in my head, it was one of the strengths of Postgres.

Example:

SELECT * FROM logs WHERE uuid IN (SELECT uuid FROM logs LIMIT 1)

There is no index/primary key on uuid. This query is ridiculously slow. The optimiser obviously does not magically figure out it can do the sub-query first and just operate on the results.

EXPLAIN says:

"Hash Semi Join (cost=0.13..788186.68 rows=1 width=644)"
" Hash Cond: ((logs.uuid)::text = (logs_1.uuid)::text)"
" -> Seq Scan on logs (cost=0.00..769721.60 rows=7034260 width=644)"
" -> Hash (cost=0.12..0.12 rows=1 width=37)"
" -> Limit (cost=0.00..0.11 rows=1 width=37)"
" -> Seq Scan on logs logs_1 (cost=0.00..769721.60 rows=7034260 width=37)"

Using Common Table Expressions, it is very fast:

WITH logs AS (SELECT uuid FROM logs LIMIT 10) SELECT * from logs

EXPLAIN says:

Limit (cost=0.00..1.09 rows=10 width=37)
-> Seq Scan on logs (cost=0.00..769721.60 rows=7034260 width=37)

I am sure this is Postgres 101 stuff, but am just mildly disappointed that such a seemingly basic query doesn’t magically Just Work.