logrotate causing logs to only log to .log.1 after upgrade to Debian 10

Had an issue recently on a relatively old Ubuntu server that had had a distribution upgrade between LTS versions.

Months after the upgrade, I realised that at least some of the rsyslog log files were logging new lines to the .log.1 file, not to the .log file – for example, any new SSH logging was written to /var/log/auth.log.1, not /var/log/auth.log as I expected.

This broke a few things – fail2ban stopped blocking SSH attempts, because it was monitoring auth.log and not auth.log.1, and a Grafana visualisation showing SSH login attempts broken because it was parsing a file that never changed.

The problem appears to be in logrotate. The config file /etc/logrotate.d/rsyslog looks something like this:

...
/var/log/messages
{
        rotate 4
        monthly
        missingok
        notifempty
        compress
        delaycompress
        sharedscripts
        postrotate
                invoke-rc.d rsyslog rotate > /dev/null
        endscript
}

I finally noticed there was a rsyslog.dpkg-dist file in there which looks like this:

...
/var/log/messages
{
        rotate 4
        monthly
        missingok
        notifempty
        compress
        delaycompress
        sharedscripts
        postrotate
                /usr/lib/rsyslog/rsyslog-rotate
        endscript
}

Changing the postrotate line seems to have fixed the issue. Looking at the script it seems to check some systemd directory, so I guess some core config changed at some point between whatever ancient distribution this was running (Debian 8, I think) and the current one (Debian 10).

Downloading a Google Takeout file with curl

I had to download a large (~60GB) Google Takeout file today; asking Google to split the file into chunks of 10GB resulted in this:

I tried to download the file twice in the browser; both times it completed and then vanished from my disk drive. Then I was told I couldn’t download it again. So I had to create an entirely new Takeout.

Needless to say, this was frustrating. Copying the URL and pasting it into wget or curl doesn’t work. There are a bunch of now seemingly useless blog posts and Stack Overflow posts that imply it should work, but I couldn’t get any of them to work.

After some mucking around, what did work for me, as of today’s date, was (in Chrome):

  1. Prepare the Takeout & go through it until you get to the ‘Download data’ image shown above.
  2. Start the download.
  3. Go to the downloads tab and copy the URL there.
  4. Stop the download.
  5. Go back to the Takeout page, open devtools, and refresh it.
  6. Find the first URL to load (the base page). It looked something like this for me:
    https://takeout.google.com/manage?user=xxxxxx&rapt=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  7. Right click that URL and ‘Copy cURL’ for your appropriate OS
  8. Paste that into a notepad or whatever and add a new line (don’t forget to add a new line marker at the end of the previous line – \ on Linux or ^ in Windows.
    -o All mail including Spam and Trash-002.mbox
  9. Paste that into your terminal/cmd/shell & run the curl command.

The Mysteries of Remote Desktop

Looking at some aspects of Remote Desktop & trying to create a more current general reference of several issues, including:

  • Can you get Google Credential Provider for Windows (GCPW) working on with Windows Server 2022 and Remote Desktop?
  • How does Guacamole work with Remote Desktop?
  • What happens if Network Layer Authentication (NLA) is on or off?
  • How does the Remote Desktop app for Android/Chromebook work in different scenarios?

I am currently trying to resolve a small issue where some of our users need to work with a particular type of Adobe Acrobat form, built with an old form system called XFA.

XFA allows users to create forms that can be submitted directly from within Acrobat Reader. You can send these forms out, have your users fill out some information, and hit submit, all from within the comfort of their own Reader, and it will shoot the information to you electronically.

This sounds like a very handy feature, but it comes with one caveat – it only works in copies of Reader on Windows and Mac OS. Any other version of Reader won’t work. So you can’t complete XFA forms on Linux, Android, Chromebooks, or iOS.

Firefox have made some effort to support XFA forms, but the particular forms we are working with – which come from a shipping line – don’t work. Some of the form interface elements simply won’t load (e.g., a select dropdown populated with some items) and then ultimately it can’t submit. I don’t blame Firefox for this – it looks like supporting XFA would be a pretty major project, and given the last update to it was in 2012, it feels like something of a technological dead end.

Unfortunately, at my workplace, about 50% of our staff are using Chromebooks. Fortunately, only a few people need to deal with the XFA forms. We’d rather not move them to Windows – Chromebooks suffice for literally everything else they need to do, and we’re not really set up to be a Windows org. Trying to get a multi-billion dollar shipping line with established workflows built around decades-old technology has not proven to be an easy thing to do, so we’re left with solving this problem ourselves.

So I set about trying to find some alternatives to moving everyone to Windows for an edge case. The first thing that came to mind – we can just have a shared Windows machine that everyone can jump into to fill out these forms? Easy! Dust off hands, problems solved.

It turns out it’s not that easy. Here is why:

  • First of all the Remote Desktop application on Chromebooks -which is the Android version from the Play Store – seems to be a bit garbage. It works, to some degree, but it has a long list of complaints raised about it continually disconnected and generally not working properly. This makes me a bit nervous about deploying it and having people rely on it for mission-critical stuff.
  • There’s no really easy way to put a file on a Remote Desktop server if you’re not connecting in from Windows – if you’re a Windows client, you can just copy/paste the file directly from one desktop to another. Chromebook users can’t do that, & we don’t want to have to worry about users logging into their various accounts on the Windows server just to put a single PDF on there, as they’d have to remember to logout. (Just leaving a whole separate Windows machine in the office has these difficulties, compounded by the fact that people work from home so it’s not always available).
  • The main activity they need to do is filling out these forms, which is mainly done by copying and pasting data from a web site into the PDF. Copy/pasting seems a bit suspect in RDP from the Chromebook – sometimes it seems to not work. It would simply not be viable to present this as an option and tell our users they can’t copy and paste with 100% reliability.

To try to work around some of these problems I looked at a few different things and had many adventures. I have summarised these below:

Remote Desktop with GCPW on Windows Server 2022

I had hoped Remote Desktop on a Windows server with the Google Credential Provider for Windows installed might fix things. This should mean that users can log in with their Google Workspace accounts and have access to their usual profile stuff, including their files. This would make the file sharing stuff a lot easier, and ensure they have access to their browser profile as well, which might help move things around.

But this has a few problems. For one, GCPW is not even supported on Windows Server. However, a bit of websearching shows that it does work, but not without some headaches.

The main headache has to do with the security and encryption used by modern Windows Server RDP. It uses this thing called NLA (Network Level Authentication), which is the thing in the RDP client that prompts for your user credentials before you’ve connected to the server. Without NLA, you first connect to the desktop of the remote machine, and then log in as usual.

With NLA, GCPW doesn’t work, because the server doesn’t have the opportunity to start the Google authentication login dialog flow.

But! You can disable NLA. That should fix it, right?!

Kind of. Disabling NLA is a two-sided thing – you need to disable it on the server, and on the client. To disable NLA on the server, you go to Settings -> Remote Desktop -> Advanced Settings, and uncheck the NLA checkbox:

That will fix the server side. But you also need to set the client to not use NLA.

But this is not something you can do in mstsc.exe (the Windows RDP client)!There is no option you can check to disable NLA, so even with NLA disabled, the client will still prompt you when you start the connection.

Fortunately I stumbled across this random gist which explains the solution – you need to save a .rdp file, and edit it to add the following line:

enablecredsspsupport:i:0

(It turns out Microsoft do have this documented, but I could only find it once I already new the above magic setting.)

With that option set in my .rdp file, I was able to connect to the desktop of the server and see the GCPW login.

Logging in will take you through the normal Google login flow, but after authenticating, you might get an error – GCPW will create the user account after auth, but (unless you’ve already configured your Windows server to allow everyone RDP access) that user will probably not be in the RDP user group. Fixing this requires manually adding that user into the Remote Desktop users group, which is easy:

  1. Open the Users/Groups control panel: Start -> Run -> lusrmgr.msc
  2. Click on the Users and look for the newly created GCPW user. Their username will be something like %firstname%_%email_domain_bit% – e.g. if your email address is david@example.com, the username will be something like david_example
  3. Double click/open that user, select “Member of” tab, and add to “Remote Desktop Users” group.

Once that is done you should be able to log in with your Google account as normal.

(Note: I re-imaged this machine and installed it again from scratch to make sure it worked the same way a second time. It didn’t. On the second time, when I tried to log in with the GCPW setup, I got the message “Your administrator doesn’t allow you to sign in with this account”. Fortunately Google are all over this message and have a FAQ addressing it – the fix is to simply add the registry key HKEY_LOCAL_MACHINE\Software\Google\domains_allowed_to_login with a string value of your Google Workspace domain. I don’t know why the GCPW installer did this correctly the first time, but not the second time?!)

Once this was working I thought I was pretty much done – now, to just get this going from a Chromebook! (Like an idiot, I didn’t think about the Chromebook aspect of this before I started.)

It turns out the Remote Desktop app available for Chromebook (at the time of writing, it is v10.0.19.1290) does not support connecting without NLA. There is no client option to disable the NLA authentication flow. You can copy over a .rdp file and open it, but it still prompts as if NLA is enabled, so my guess is the current version of RD for Android simply requires NLA for all connections.

(There was a version of the Remote Desktop app (version 8) which, just because it was older, may have worked for NLA-less connections. Based on the comments in the current RD app on the Play Store, version 8 was superior in a lot of respects, so who knows. Unfortunately it is no longer available from the Play Store, but a lot of people seem to be downloading it from random places as a .apk to keep using it -which says a lot.)

So – connecting from Chromebooks directly using GCPW doesn’t seem like an option. What else can we try?

Windows Server 2022 Remote Desktop Web Client

I vaguely remembered that Windows users could Remote Desktop from a web interface, so maybe I could set that up?

This was a wild adventure. It’s not just a checkbox you can click to turn on the web client. First of all, you need to install the Remote Desktop Services feature.

This is a huge process and comes with its own lengthy instructions. I won’t go into all the problems and hassles because they’re kind of out of scope, but I’ll mention a few of the main things that popped up.

  • You can’t install the Remote Desktop Services – as far as I can tell – unless your Windows Server is part of a domain. This can be an old school domain or a newfangled Azure Active Directory thing. I set it up as a normal domain so I didn’t have to muck around with Azure.
  • To do this you need to add the Active Directory Domain feature – Server Manager -> Role based or Feature based -> Add Active Directory Domain Services.
  • This failed for me with a random error, “Windows Server 2016: One or several parent features are disabled so current feature can not be enabled. Error: 0xc004000d”. I found this random blog post from 2018 which had some arcane PowerShell solution, which I’ll reproduce here for posterity:
    • Open an elevated PowerShell prompt and enter the following to check if the .NET Framework Feature is installed:
    • Get-WindowsFeature Net-Framework-Features
    • Then enter:
    • Get-WindowsFeature Net-Framework-Features | Remove-WindowsFeature
    • Finally repeat the first command to check the feature has been removed:
    • Get-WindowsFeature Net-Framework-Features
    • Restart the server.
    • Restart-Computer
  • I have no idea what this does, but it fixed the problem for me, and I was able to install Active Directory Domain Services.
  • Once the server was now in a domain, I was able to blather my way through the install process for Remote Desktop Services. I used this guide as a basis.
  • Once that is done, you need to do another completely insane manual PowerShell process, but this one is part of the official Microsoft documentation: Set up the Remote Desktop web client for your users. This also had some problems:
    • You’ll need to update to the latest PowerShell version first. The article doesn’t mention it but it doesn’t work with the default PowerShell installed on Windows Server 2022 at least.
    • Step one is “On the RD Connection Broker server, obtain the certificate used for Remote Desktop connections”. I wasn’t quite sure what this meant, but figured out it means just open “Manage user certificates” tool, look for the Remote Desktop category, and export the certificate in there. There was only one for me; I picked the first export option and saved it as a .cer file.
    • Step 6 is importing that certificate via. The instructions say: “Next, run this cmdlet with the bracketed value replaced with the path of the .cer file that you copied from the RD Broker”. It should say “… replaced with the full absolute path” – I kept getting random PowerShell errors ('Exception calling ".ctor" with "1" argument' kind of thing). You can also run it with no parameter, and it will prompt you for the path

A brief segue – RemoteApps!

After all that, I kind of forgot what I was doing because I discovered these things called “RemoteApps” – basically, a way to let people Remote Desktop into a specific app! I started playing around with them as I thought this might be a neat compromise – people could just RD directly into Acrobat Reader and wouldn’t even need to deal with all of the rest of the experience!

Unfortunately this seemed a bit painful as it meant I still needed to figure out a way to get these PDFs onto the server in a place where the Acrobat process could access them.

Anyway, RemoteApps looked cool and worked perfectly once they were set up, ignoring the other problems, so that is something I’ll be coming back to.

Guacamole – an open source Apache project supporting Remote Desktop

I had recently heard about Guacamole, which looked interesting as a piece in this puzzle – perhaps it would solve one or more of the problems.

It was a real chore to get running. The documentation is great in some parts but confusing and contradictory in others. I started with the recommended Docker process in the hope that it would avoid compiling from source and mucking around with Linux too much, but I ran into a ton of problems with the Docker setup:

  • The Docker version seems to have a hard requirement of one of the authentication modules, like MySQL. Some of the older documentation refers to a “noauth” solution (and LLMs will uselessly suggest this is an option) but lots of trial and error and reading docs and support tickets finally led me to the conclusion that with the Docker version there is no way to do this without an auth layer.
  • Setting up MySQL is something I’ve done a million times, but not with Docker. My first experience sucked because the Docker instructions for MySQL seem wrong – the official image instructions say you can connect to the mysql CLI tool, but that flat out didn’t work for me, so I had to muck around trying to get that working. (I imagine Docker pros won’t have this issue but it was annoying that the most basic default instructions failed.)
  • The RDP server I want to connect to is on our Tailscale tailnet. The Docker container can’t access the tailnet without setting up the Tailscale sidecar. This looks like a decent solution but I am not familiar or confident enough with Docker to bother mucking around with it.

So, deep sigh, let’s try compiling from source.

This turned out to be utterly painless and very fast. The biggest issue I had was deploying the .war file for the client – I set up a new Debian box, which installs Tomcat 10 by default. Everything looked like it was working, but I would just get random errors trying to load the Guacamole client.

I eventually discovered that this is just a basic compatibility issue with Tomcat 10. It’s frustrating that this info isn’t front and centre in the Guacamole documentation (or maybe it is and I just missed it).

I downgraded the box to Debian 11 and tried again with Tomcat 9 and it worked perfectly first go.

I ran into a few small issues that could be documented a bit better. I’ll summarise some things if you don’t want to read all the docs:

  • First and most importantly, you do not need to set up a MySQL server for authentication, or any of the other authentication types. It will work fine out of the box with the basic authentication provided by the user-mapping.xml file
  • You need to create a /etc/guacamole directory. Download the example user- mapping.xml file from their Github repo.
  • Modify the XML file as required to test it out. I’d recommend testing with the default built in accounts (e.g., USERNAME2 / PASSWORD) to start with just to make sure you can log in to the Guacamole interface and the RDP connections are working as expected.
  • Define your own connection(s). Note that the hashed passwords need to be generated with echo -n [password] | sha256sum, not just echo.
  • I struggled to get NLA-less connections going. The Guacamole documentation references NLA and no NLA repeatedly so I was sure it was possible but I couldn’t find the right combination of parameters. Eventually I figured it out through trial and error, so here it is:
    <connection name="acrobatbox">
    <protocol>rdp</protocol>
    <param name="hostname">8.8.8.8</param>
    <param name="port">3389</param>
    <param name="security">tls</param>
    <param name="disable-auth">true</param>
    <param name="ignore-cert">true</param>
    <param name="resize-method">display-update</param>
    <param name="normalize-clipboard">windows</param>
    </connection>
  • The key bits in the above are setting security to tls and disable-auth to true – this is the magic combination to disable NLA on the client side.

Once that was done, I finally had an RDP connection working via Guacamole which respected the need for no NLA, so I could log into the desktop with GCPW – success, right?!

Well, no. It turns out that Guacamole is a great piece of software which works amazingly well and covers a lot of ground, but unfortunately kind of sucks to use if one of your main requirements is using the clipboard a lot.

The way the clipboard in Guacamole works is… weird. The details are in the documentation, which, had I read before embarking on this journey, I might have not bothered.

The basic gist of it is you have a menu available when you’re connected to the server via Guacamole. In that menu is a text area. You can use that text area to move information back and forth between the two systems.

This is obviously not a great user experience. It must come up a lot, because it’s mentioned in their FAQ under a question that is approximately “why don’t you just make the clipboard work?

The answer there implies though that it does work, at least as long as you’re using a modern browser that supports Clipboard API. I was testing with latest Chrome so figured I’d be OK, but it still wasn’t working.

Some more investigation revealed the problem: Chrome will only allow access to the clipboard if you’re accessing it via HTTPS. I set up an nginx proxy in front of it and tried again in a new browser – this time it actually prompted with with a browser notification to allow access to the clipboard, and it worked as expected, with the usual bi-directional clipboard flow working pretty much perfectly. The one exception to this is the login screen, where you can’t copy/paste the password, which is a small frustration, but probably manageable for most users.

Resurrecting old PGP Virtual Disks with PGP Desktop v8

I recently needed to recover some information from a PGP Virtual Disk (a .pgd file) that I created back in about 2012. This drive was created with PGP Desktop v8.

After a long adventure, it turned out that the information I wanted wasn’t even stored in this disk, but it was such a hassle to figure this out I thought I’d write up a bit of it in case it’s useful to others.

PGP Desktop was, back at that time, software sold by PGP Corporation. This company was later purchased by Symantec, and the application was then changed to Symantec Encryption Desktop. In 2019 it was bought by Broadcom.

At some point along this journey, it became seemingly basically impossible to buy this software as a single user. As far as I can tell, you can only buy the software from Broadcom (or more accurately, from a Broadcom distribution) as part of a volume licensing arrangement, starting at something like 50 seats.

This is obviously ludicrous if you just want a single license to muck around with. This fits with what little I know about Broadcom – that they are a company only interested in working with enterprise clients, and everyone else can more or less get lost.

I finally found the installer for PGP on an old backup drive. Hoping this is useful to someone else looking to resurrect old data, the details are:

Filename: PGP8.exe
sha1sum: 38c43ef41b9c15996bdaeb1711d2ca0afe99ada5

I am unwilling to post the file due to copyright concerns, but from a quick look it seems to be available online, and as long as the sha1 matches, you should be OK. (Not a guarantee. Use with caution.)

Unfortunately this would not install on Windows 10 – it started the install process and seemed to complete, prompting for a reboot, but it wouldn’t install any shortcuts or map any files, and it wouldn’t start the activation process. I suspect whatever kernel-level magic it needs is simply not compatible with Windows 10.

I figured Windows XP would work, but then it was a struggle of finding a Windows XP machine. I ended up using the following ISO which I found on archive.org:

Filename: en_windows_xp_professional_with_service_pack_3_x86_cd_vl_x14-73974.iso
sha1sum: 66ac289ae27724c5ae17139227cbe78c01eefe40

I was hugely reluctant to use this ISO, like I am with any random executables on the Internet, but that sha1sum shows up in searches in documents well over 10 years old, and seems relatively reliable. Of course, approach with extreme caution, and limit your exposure as much as possible.

The PGP Desktop installer worked perfectly on a VirtualBox VM spun up with this ISO – until I got to the activation stage. I tried my original license details, but of course the activation servers are long since gone, so short of reverse engineering the activation process and faking a valid response (which sounds fun but who has the time), this is a no go.

Fortunately, PGP Desktop was created in an era where the Internet was still considered to be not always present, and the creators built in a manual, offline activation process.

A quick search and I found a random manual license authorisation that worked perfectly. I am reluctant to post the details here but here are some clues to help you find one – there are many references to them online. A websearch for the following string, with quotes, should give you some options:

"BEGIN PGP LICENSE AUTHORIZATION" "PGP Desktop" "PGP Enterprise"

Once I had this done, I was able to mount my old .pgd file and access the information (to discover at this point the information I wanted wasn’t even in there).

Understanding AWS Cost Explorer usage records

AWS has a tool called Cost Explorer allows you to dive into your AWS services to understand what you’re paying and where.

Being AWS, if you want more detail than what you get by default, you can also pay more for a mildly upgraded version of Cost Explorer, which gives you hourly granularity – so if you’re testing a new service and want to really make sure the costs aren’t going to blow out, you can do so a little more safely by monitoring it on an hourly basis.

The documentation is a little vague about this though – there is a page called “Estimating cost for Cost Explorer hourly granularity” which explains Cost Explorer will start billing you for ‘usage records’ once you move into hourly granularity – but there’s no clear mention of what these are or how to track them.

When you turn this feature on you’re more or less forced to rely on their estimates of how many usage records will be used. I personally always find this completely daunting, given there is no way to cap your spend, so I was very keen to understand exactly how to at least measure my ‘usage records’ once I’d enabled hourly reporting, because I couldn’t really find any metrics for it anywhere obvious.

While I haven’t been able to answer that question, I can report that once you have enabled this feature, it does add Cost Explorer to the list of services available to filter in Cost Explorer itself:

This means you can at least track your overall spend on usage records, if not the actual number consumed.

As far as I could tell, even this information – that Cost Explorer would appear in Cost Explorer once you enabled hourly granularity – is not in the documentation anywhere. Amazon Q, their AI robot chat agent thing, thus is not able to provide useful answers on the topic, just referring you to how it’s estimated and priced.

A Random Solution to 100% CPU Usage in Windows

Like a zillion people on the Internet, my laptop (a Dell XPS 7590) occasionally starts going nuts with the fans, because a CPU core is pegged at 100% for no obvious reason.

Task Manager/Resource Monitor all uselessly show it’s System, and trying to debug further in Process Explorer just leads you down a rabbit hole of potential Windows problems.

After picking a few of these at complete random I luckily eventually stumbled across something that fixed it – changing the Power Options related to USB settings. Specifically, disabling USB selective suspend setting when the system was plugged in.

This immediately made the 100% CPU pegging go away, solving a problem that had plagued me for over a year. No idea if it was related to USB devices I had plugged in – I had the same problem at work and in the office, where I have completely different equipment.

No idea what the potential impact to this is but I find it hard to believe it’s worse than my CPU spinning randomly.

AWS Transfer Family SFTP Directories Are A Bit Weird

There are still a lot of people out there with SFTP (and even FTP!) based workflows. Amazon know this and have a dedicated product called AWS Transfer Family, which is basically an amazingly expensive SFTP wrapper that lives on top of S3.

If you don’t want the hassle of running SFTP on a $5/mo virtual server, then paying AWS on the order of USD$200/mo might be a good option.

There is some slightly weird behaviour compared to standard SFTP that caught me by surprise relating to directories.

(Note: I am doing this on a client’s SFTP setup, so I don’t know what it actually looks like on the S3 side.)

  • If you try to rename a file into a directory that does not exist, you will not get an error – it will actually work, and create some sort of “virtual subdirectory” in the S3 bucket. e.g., if you do rename example.txt backup/example.txt, without the backup/ directory existing, and then do a directory listing, you’ll see there is a new backup/ directory that was created by that rename operation.
  • If you then move the file back – rename backup/example.txt ./example.txt – the backup/ directory will disappear.
  • If you create the backup/ directory first, and repeat the move in and out, the directory will persist.
  • If the backup/ directory was created by the rename command, and you then try to do an ls * on the parent directory, it will return the files in backup/ as well – i.e., it will act like a recursive ls.

If you are trying to get closer to standard SFTP-based behaviour with directories, I suspect it’s safer to manually make the directories first (as you would normally) instead of relying on this weird automatic directory creation you get from the rename.

\d vs [0-9] in JavaScript/node.js regex

I was trying to debug while a seemingly very simple regular expression in JavaScript was failing.

The goal was to catch expressions going to an API endpoint that looked like:

/endpoint/145,14,93

The regex I had was working fine in regex101.com’s simulator:

^/endpoint/\d+,\d+(?:,\d+)*$

But running under node.js, it wouldn’t work – it would catch a single digit, but not any subsequent digits.

Spent a while trying different things – mostly assuming I was doing something boneheaded due to my lack of familiarity with node.js. A colleague verified the same thing and also wasn’t sure.

I then realised it worked fine if I replaced the \d with [0-9]. I thought this was weird – the MDN documentation says:

Matches any digit (Arabic numeral). Equivalent to [0-9]. For example, /\d/ or /[0-9]/ matches “2” in “B2 is the suite number”.

… which made me assume they were the same thing.

After much websearching & the usual difficulty in finding meaningful results with search terms like “\d”, in desperation, I thought I’d ask ChatGPT, and got the following result:

The fourth point seems to be the case – the \d is also matching the comma.

I’m sure this is documented somewhere (otherwise how else would ChatGPT know about it?!) but I couldn’t find it referenced in any of the stuff that came up through common search terms.

SEO? More like SE-NO!

[Hilarious alternate title: Dealing with SEO: How to go from SE-No to SE-Oh Yeh.]

I wrote this article back in 2014 for our agency’s blog. It was never published – I suspect our sales & comms people didn’t like it as it conflicted with some of our service offerings.

While it reads a bit dated, I think the core tenets are still more or less correct, and I just thought the title was very funny, so here we are.

SEO? More like SE-NO!

Until social networking came along, search engine optimisation (SEO) was the undisputed king of web buzzwords. If you weren’t doing SEO, then you were crazy and your website – and by extension, your business – was going nowhere. 

SEO now has to vie for mindshare against a social media strategy, but it has a long legacy and thus is still heavily entrenched in the minds of anyone who is trying to do anything in the online space. 

If you’re running a website for your business, and you’re not a technical person familiar with the intricacies of SEO, you might have been concerned about this – how do you make your website stand out? How do you set things up so that when someone types in the name of your business or your industry, you show up on the first page? 

In short – do you need SEO?

Well, vast hordes of SEO specialists, agencies, companies and consultants have sprung up over the preceding years to help answer these questions. Sounds promising, right? In an increasingly knowledge-based economy, it’s obviously helpful to have a bunch of people who have devoted themselves to becoming experts on a topic, so you can leverage their abilities. Great!

Unfortunately, things aren’t great in the world of SEO. Things are messy. Let’s have a look at why. 

What is SEO?

First up – what the heck is SEO, anyway? “Search engine optimisation” sounds pretty clear-cut – everyone needs to be on search engines!. But what actually is it? When someone “does SEO”, what exactly are they doing?

The short answer is: it could be anything. “SEO” is not the sort of hard, technical term that is favoured by computer nerds like us. There’s no specification, there’s no regulations, there’s no protocols – there’s not even an FM to R. 

In a nutshell, SEO means making changes to a website to improve how search engines react to it. It can be as simple as making sure you have a title on your page – for example, if your business is a coffee shop, you might want to make sure you have the words “coffee shop” somewhere in the title. It can be complicated, too – like running analyses on the text content on your site to measure keyword density. 

Changes can also be external. One of the biggest things that impacts a site’s rankings in search results is how many other people on the Internet are linking to you. So one SEO strategy is to run around the Internet and make a bunch of links back to your site. (We’ll talk about this a bit more later.)

Other technical things might influence SEO as well. Google recently announced that whether or not a site used HTTPS (the secure padlock dealie that means your website is secure for credit card transactions) would start having some impact on rankings. 

As we can see here, there’s a bunch of different things that can affect your SEO – and I’ve only listed a handful of them. There are more – and they all are interrelated. 

As if that wasn’t complicated enough, there’s something else that affects where you end up in search results – the person who is searching. Where you are will change things – if I’m searching for coffee shops, I’m more likely to get results that are geographically closer to me. If I’ve done a lot of searches for certain terms, I’m more likely to to see results based on those terms. 

If you have your own website and regularly visit it, it is possible that will affect the rankings as you see them. If you search for yourself you might see your ranking up higher than someone else doing the exact same search located in the next street – or the next town, state, or country. 

What’s the practical upshot?

In short: SEO is complicated. There are lots of variables, and they are hard to control. 

That’s not even the really bad part: the only people who know exactly how the search ranking system works are the search engines themselves. No matter what you do, the outcome is still 100% determined by whatever is going on with the search engines on any particular day. 

No matter what you’re told, anything anyone knows about how to “do SEO” comes from one of two sources: information made publicly available by search engines, and from reverse engineering search engine behaviour by experimentation. 

You might invest large amounts of time and effort (and money) in trying to execute a particular SEO strategy, only to have Google come along the next day and announce they’ve changed everything, and your investment is largely wasted. 

SEO is a moving target. Things are constantly in flux, in no small part due to the huge number of people attempting to game the system by any means possible – in a world where a top ranking in a competitive search result can mean a huge increase in sales in a very short time, getting an edge is a big deal. And many slightly more nefarious techniques – usually dubbed “black hat SEO” – have emerged, which in many cases can do massive damage to your rankings. 

As if all that wasn’t traumatic enough… your ranking is something that evolves over time. A new website won’t appear in search results immediately at all; it might take a few days to show up, and in most circumstances will be low in rankings. If you’re in a competitive space, it might take you months to even register on the first few pages of results. 

This means it is very, very hard to do any sort of significant or reliable experiments with SEO in a short timeframe. You can’t change a few words and the instantly check how they affect your rankings. You have to wait – a long time – to see if it has any effect. During that time, there will be any number of other changes that have occurred, making it hard to confirm if your experiment worked. 

Doing SEO scientifically is hard. Measuring cause and effect is hard enough in small experiments when there are few variables and they can be tightly controlled. In SEO there are many variables, constantly in flux, known only to the clever engineers that write and evolve the ranking algorithms – the secret sauce that drives how every search engine works. 

I said what’s the practical upshot!

Oh, right. Well, the practical upshot is that the world of SEO providers is full of people over-promising and under-delivering. 

This is the big risk of paying for SEO services. Because it’s such a vague, hand-waving term that encompasses so many different areas, there are, sadly, a number of operators in the space that use it as an opportunity to provide services that are not quantified or qualified in any meaningful way. 

Because of the complexity of the systems involved, it is practically impossible to deliver a promise of results in the SEO world. You might get promised a first page search result, but it is extremely difficult to deliver this, especially in competitive spaces – if you’re trying to get your coffee shop on the first page of Google results for the term ‘coffee shop’, you’ve got a long road ahead of you. 

Worse, there are black hat operators that will do things that look like a great idea in the short term, but may end up having huge negative ramifications. “Negative SEO” is one of the more recent examples. 

As a result, there are plenty of witch doctors offering SEO snake oil. Promises of high rankings and lack of delivery abound – followed by “oh, we need more money” or “you need to sign up for six months to see results”. 

One only needs to look at the SEO sub-forum on Whirlpool –  one of the most popular communities in Australia for those seeking technical advice – to see what a train wreck the current SEO market is. At the time of writing there’s a 96 page thread at the top with unsatisfied customers of one particular agency. There are stacks of warnings about other agencies. Scroll through and have a look. 

Customers of many SEO agencies are not happy, and it’s because they’re paying for something they don’t really understand without getting crystal clear deliverables. 

The situation is so bad that the second sentence on Google’s own “Do you need an SEO?” page states: 

Deciding to hire an SEO is a big decision that can potentially improve your site and save time, but you can also risk damage to your site and reputation.

Some other interesting terms used on that page: “unethical SEOs”, “overly aggressive marketing efforts”, “common scam”, “illicit practice”… indeed, the bulk of the document explains all the terrible things you need to watch out for when engaging an SEO. 

(I should stress that this is not a general statement that encompasses all those who perform SEO. There are many smart and dedicated people out there that live on the cutting edge of search engine technology, doing only white hat work, delivering great things for their clients. The hard part is finding them in the sea of noise.)

Cool story. What does this mean for me?

Back to the original question – do you need SEO? 

There’s no right answer. It’s a big question that encompasses a wide range of stuff. Without looking at your specific situation it’s hard to tell how much effort you should put into SEO at any given point in time. 

Remember: there’s no clear-cut magic SEO bullet that will do exactly what you want. But one thing is for sure – someone will happily take your money. 

If you decide to engage someone to help optimise your website for search, here’s a quick list of things to pay attention to:

  1. Carefully read Google’s “Do you need an SEO?” document, paying particular attention to the dot points at the bottom. 
  2. Establish clear deliverables that you understand – you need to make sure that you know what you’re paying for, otherwise what you get will be indistinguishable from nothing. 
  3. Tie any payments (especially ones involving large amounts) to performance metrics – but don’t be surprised if they’re not interested in doing this. (What does that tell you?)
  4. Remember that anything that is not a simple content update that you can do yourself might have other costs – for example, changing page layout or adding new tags might require you to get web developers on board. 
  5. If you’re building a new site from scratch, make sure your developers are factoring in SEO right from the outset. Almost any decent developer will be using a framework or software that takes SEO into consideration, and as long as they – and you – are paying some attention to Google’s SEO Starter Guide (EDIT: 2018 version is here: https://support.google.com/webmasters/answer/7451184?hl=en ) you’ll end up in a much better position. 
  6. Strongly consider search engine marketing (SEM) instead. SEM is the thing where you pay companies like Google money to have your website appear in search results as ads, based on specific terms. The Google programme – AdWords – gives you incredible control over when your ads appear, and you also get excellent data back from any campaigns. With AdWords you can actually effectively measure the results of your work – so you can scientifically manage your campaigns, carefully tracking how every one of your marketing dollars is performing.