River: A fast, robust job queue for Go and Postgres
In a transaction, a job is emitted to a Redis-based queue and picked up for work, but the transaction that emitted it isn't yet committed, so none of the data it needs is available. There's a brief moment between the commit and job emit where if the process crashes or there's a bug, the job is gone, requiring manual intervention to resolve. Postgres' NOTIFY respects transactions, so the moment a job is ready to work a job queue can wake a worker to work it, bringing the mean delay before work happens down to the sub-millisecond level. Despite our operational trouble, we never did replace our database job queue at Heroku. So a few months ago, Blake and I did what one should generally never do, and started writing a new job queue project built specifically around Postgres, Go, and our favorite Go driver, pgx. A big part of our queue problem at Heroku was the design of the specific job system we were using, and Ruby deployment. So for every new job to work, every worker contended with every other worker and iterated millions of dead job rows every time.