Hacker News new | past | comments | ask | show | jobs | submit login
Parallel Snapshotting: make pg_dump and pg_restore multi-threaded per table (peerdb.io)
83 points by spathak 10 days ago | hide | past | favorite | 11 comments





Several years ago at a company I no longer work at, when we swapped from single-threaded pgdumps (from a replica) to multi-threaded pgdumps (-j flag), we occasionally saw some dumps which were "inconsistent". Not sure if that's the best term for it, but we would occasionally get duplicate foreign key errors on pg_restore or even if pg_restore succeeded, sometimes primary key sequences would be less than the maximum id of our table (so new inserts would fail). This was on standard postgres 13.3 (not aurora) on AWS RDS. Nothing really noteworthy about our setup either that I can remember. I tried reporting it at the time on both the postgres mailing list and with AWS support and unfortunately both pointed fingers at the other and I eventually gave up since it was not easy to reproduce. We unfortunately ended up having to go back to single threaded dumps which never produced any of these "corrupt" dumps but was obviously much slower.

I can’t speak specifically to duplicate foreign key errors but I possibly fixed that bug via a fix to parallel dump/restore in the 12-13 era. It would manifest as a dump that would fail on restore with errors related to schema dependencies (trying to restore database objects out-of-order).

I was expecting to see https://github.com/dimitri/pgcopydb when clicking this. How is this different?

OP here. pgcopydb is also a great tool. It does pg_dump|pg_restore. It cannot multi-thread (parallelize) single table data migration

seems like it can parallelize to an extent: https://pgcopydb.readthedocs.io/en/latest/concurrency.html#s...

It only uses pg_dump and pg_restore for transfer of the schema, the actual data transfer is a custom mechanism.

does pgcopydb support copying a single table in multiple threads?


Because this is advertising :)

Does this work with Azure|AWS Postgres as source and replicate to an onPrem Postgres and vice-versa?

Yes it should work for any postgres database as a source or a target. Let it be on the cloud or on-premise



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: