Hacker News new | past | comments | ask | show | jobs | submit login

If you are running schema migrations in production, use "lock_timeout". Even seemingly-benign modifications like dropping a table (with foreign keys) or dropping a foreign key, which are generally quick and may run nearly-instantaneously when you're testing them, may end up hitting a lock conflict on a heavily-used production database (with existing transactions, with an autovacuum, etc). That ALTER is then waiting on the first transaction's lock, but it has acquired an ACCESS EXCLUSIVE lock, meaning no queries can run against the locked table.

If you're doing any postgres at scale, it's just a matter of time until you hit one of these conflicts. "lock_timeout" will just cause the migration to fail after the timeout, rather than just blocking all other queries.




'statement_timeout' includes lock timeout and lets you better estimate the impact on hot tables. When you set the timeout to 5s, you know the downtime will be max 5s total, and transactions will continue afterwards. With lock timeout, you dont control how long the part after the lock will take - may be fast, may be slow, due to concurrent traffic, for example.


Related to this, there is quite a large variation across Postgres versions about whether a particular DML query will take an exclusive lock or not.

Is there a good way to analyse a query and be informed of what sort of lock it will take?

I’ve always resorted to re-reading docs when I’m unsure.


For me the best way is testing it locally, side-by-side, with two transactions. You dont need any data for this.

After some experience, you start to see the reasons for locks and how they will impact you.

Or read the docs.


Excellent advice.

On the technical side, I believed waiting was due to the lock queue rather that having acquired an ACCESS EXCLUSIVE lock. The ALTER is specifically _waiting_ for any lock lower than ACCESS EXCLUSIVE to be release.


It also makes all new readers/writers to wait for that lock, essentially leading to downtime until the lock is eventually acquired and released. This is the classic readers/writers library example, and you want to avoid starving the writers.

Thats why size of data is the least of your issues - its the access patterns/hotness that are the issue.


Sometimes if you do that, you'll never run your ALTER. Just needs enough traffic on the table. In those cases I think the best way is to kill other ongoing queries that block the ALTER, if your app can recover from that.


It might be kind of nice if Postgres had a "force this statement through" option which kills any conflicting processes.

We created something like that in Citus when changing a node's hostname (e.g. during a failover). While node updates should be mutually exclusive with writes (otherwise we might lose them), we didn't want to wait for long-running or possibly frozen writers to release their locks. So after some initial waiting we'd start a background worker to kill anything that was blocking the node update.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: