I have a question regarding the exact understanding of how PostgreSQL executes a query. Since it's row based it shouldn't matter whether you have a query like Select * from lineitem
or Select l_orderkey from lineitem
performance wise? Further why is there a difference in measured time between Select count(*) from lineitem
and Select * from lineitem
since the full table has to be scanned in both cases?
I am measuring the performance like that:
long starttime=System.CurrenMillis();
Statement a = conn.createStatement();
a.setFetchsize(10000);
Resultset rs = a.executeQuery(Query);
while(rs.next()){}
long endTime=System.currentMillis();
System.out.println((endTime-starttime));
I have to set a fetchsize otherwise there will be a Java Heap Space Error, since the table are quite large.
In Addition i was trying to measure the difference to horizontal partitioned table. For that i splitted the table order by orderstatus='O', resulting two tables (ordersO, ordersF) which have both the same size. When comparing the queries Select * from ordersO
and Select * from orders where o_orderstatus='O'
, i was expecting that the first query on the partition only needs half of the time, since it contains only half of the tupel? But it wasnt, the measured time was more or less equal.
Thank you in Advance !
CodePudding user response:
There are several questions here.
Will SELECT col FROM tab
perform better than SELECT * FROM tab
?
Yes, it will, for two reasons:
PostgreSQL doesn't have to extract all columns from the row. If the column is early in the list of columns, that is an obvious win. But we also win, because we don't have to “de-toast” oversized attributes that are stored out of line in a TOAST table.
You have to transfer less data to the client, and the client has to process less data.
Why is SELCT count(*) FROM tab
faster than SELECT * FROM tab
?
Mostly because it does not have to transfer a lot of data to the client. In addition, it has to do less processing on each table row on the server.
Why are SELECT * FROM tab_42
and SELECT * FROM tab WHERE partkey = 42
equally fast?
That's because of partition pruning. PostgreSQL will restrict scans to the appropriate table partitions if the partitioning key is in the WHERE
clause.