I use the Greenplum database - massively multi-parallel Postgres. I have a table which that has 100 gb.
There is data from 2019 up to today. The table is not ordered, but every day we insert new data. So it's kinda sorted by a sales day. I would like to recreate this table, but I would like to sort the data before the insert. The table is currently compressed with a quicklz compression and we use the column store compression. Sorting by a specific key should be beneficial because Greenplum uses RLE. The same values will be stored together.
By recreating the table I hope to reclaim some space. Would this have any impact on the performance?
CodePudding user response:
Using RLE (which also applies delta compression internally) would be definitely beneficial for your table. Performance should ideally get better for queries as reduced IO would be performed due to a better compression ratio.
CodePudding user response:
we have a new compression in Greenplum that could be better in than quicklz; you can try zStandard compression. Sorting by column can definitely make compression more.
In regards to performance it depends what your current bottlenecks are.
In general more compression is a good thing, but if you want to get performance other factors may be more important