I have a system that creates a lot of data, around 3-4 TB per day. It receives data from multiple sources 24 hours per day. I also have a replica of this system which would allow me to have some downtime if needed...
I only need to have the latest 24h of data available, the rest I can compress and store in hdds.
How would you tackle this problem? Does postgresql offer me enough features to tackle it without the need of external software?
CodePudding user response:
You'll probably want to partition on the date, drop the previous day's partitions. See Declarative Partitioning for how to do that. Be sure to use the most up to date version of PostgreSQL as possible to get the most recent partitioning features.
Using partitions makes dropping each day's data very fast and avoids bloating the table with dead rows.
With vanilla Postgres you need to manually create and drop each day's partitions. You can make partitions in advance. This can be done with a script run periodically. Or you can use a tool such as pg_partman to manage the partitions. Use Postgres native partitioning with p_type
of native
and set a p_interval
of daily
.