Postgres Query Queries

I’m using Postgres in a production capacity for the first time and have been excited to get my hands on it after decades of MySQL and being yelled at by other nerds for not using Postgres.

Doing some maintenance on a large-ish (~7m rows) database, I was somewhat disappointed by how the default optimiser doesn’t seem to do much with relatively basic sub-queries. I had always thought one of the big weaknesses of MySQL was terrible sub-query performance, and in my head, it was one of the strengths of Postgres.

Example:

SELECT * FROM logs WHERE uuid IN (SELECT uuid FROM logs LIMIT 1)

There is no index/primary key on uuid. This query is ridiculously slow. The optimiser obviously does not magically figure out it can do the sub-query first and just operate on the results.

EXPLAIN says:

  1. "Hash Semi Join (cost=0.13..788186.68 rows=1 width=644)"
  2. " Hash Cond: ((logs.uuid)::text = (logs_1.uuid)::text)"
  3. " -> Seq Scan on logs (cost=0.00..769721.60 rows=7034260 width=644)"
  4. " -> Hash (cost=0.12..0.12 rows=1 width=37)"
  5. " -> Limit (cost=0.00..0.11 rows=1 width=37)"
  6. " -> Seq Scan on logs logs_1 (cost=0.00..769721.60 rows=7034260 width=37)"

Using Common Table Expressions, it is very fast:

WITH logs AS (SELECT uuid FROM logs LIMIT 10)
SELECT * from logs

EXPLAIN says:

  1. Limit (cost=0.00..1.09 rows=10 width=37)
  2. -> Seq Scan on logs (cost=0.00..769721.60 rows=7034260 width=37)

I am sure this is Postgres 101 stuff, but am just mildly disappointed that such a seemingly basic query doesn’t magically Just Work.

Leave a Reply

Your email address will not be published.