In the data that I have, there are around 1M rows, each with around 60-70 columns. However only few rows(20-30) will have columns beyond 30 filled, i.e, the table is sparse.Also columns beyond 30 are rarely queried.
Does "number of columns" impact performance?
Should I make two tables? one with first 30 columns and the second table is the original table.
or should I keep the original structure?
Table schema :- Table entity_table ( entity_id int, tag_1 text, tag_2 text, . . . tag_30 text, --upto col. 30 table is dense tag_31 text, . . . tag_70 text --sparse columns );
Also, does the type of these columns affect performance. Does postgres index null values, how to prevent that?
CodePudding user response:
Does "number of columns" impact performance? Short answer is "Yes, but don't worry about it."
More precisely, it eats space and that space has to go to and from disk, eats cache, etc. all of which costs resources. The exact amount of space depends on the column and is available alongside each data type in the postgres docs for data types: https://www.postgresql.org/docs/14/datatype.html
As Frank Heikens commented, a million rows isn't a lot these days. At 70 columns, 8 bytes per column for a million rows you'd be looking at ~560M which will happily fit in memory on a Raspberry PI so shouldn't be that big of a deal.
However, when you get to billions or trillions of rows all those little bytes really start adding up. Hence you might look at:
- Splitting up the table - however, if this results in more joins you could find the overall performance gets worse not better
- Using smaller column types (e.g. smallint rather than int)
- Reordering columns - see Calculating and saving space in PostgreSQL However, I wouldn't recommend this as a starting point - design for readability first, then performance
- Columnar storage https://en.wikipedia.org/wiki/Column-oriented_DBMS for which there are some postgres options which I don't have direct experience of but are potentially worth looking at e.g. https://www.buckenhofer.com/2021/01/postgresql-columnar-extension-cstore_fdw/