I ran this:
EXPLAIN select id, birth_date, ROW_NUMBER() OVER (ORDER BY 1) AS load_id from user_profile;
and I see this:
WindowAgg (cost=0.00..133833424.40 rows=30901176 width=36)
-> Seq Scan on user_profile (cost=0.00..133369906.76 rows=30901176 width=28)
What does this query plan mean?
CodePudding user response:
The query plan is the execution plan that the PostgreSQL planner (Amazon Redshift is based on PostgreSQL) has generated for the your SQL statement.
The first node is a window aggregation (WindowAgg) over the data as you're using the OVER
window function to calculate a row number.
The second node is a sequential scan (Seq Scan) on the user_profile
table, as you're doing a full select
of the table without any filtering.
A sequential scan scans the entire table as stored on disk since your query requires a full traversal of the table. Even if there is a multi-column index on id
& birth_date
, the query engine would pretty much always go for a sequence scan here as you need everything (depending on the random_page_cost
& enable_seqscan
parameters in PostgreSQL).
The cost
number is actually arbitrary, but conventionally means the number of disk page fetches; it's split into 2 values with the delimiter being ..
.
The first value shows the startup cost - this is the estimated cost to return the first row. The second value shows the total cost - this is the estimated cost to return all rows.
For example, for the Seq Scan, the startup cost is 0 and the total cost is estimated to be 133369906.76.
For sequential scans, the startup cost is usually 0. There's nothing really to do other than return data so it can start returning data right away. Total costs for a node includes the cost of all its child nodes as well - in this case, the final total cost of both operations looks to be 133833424.40
which is the sum of the scan and aggregation cost.
The rows
value demonstrates the estimated number of rows that will be returned. In this case, both operations have the same value as the aggregation will apply to all rows & no filtering is being carried out that will reduce the number of final rows.
The width
value demonstrates the estimated size in bytes of each returned row i.e. each row will most likely be 28 bytes in length before the aggregation and 36 bytes after the aggregation.
Putting that all together, you could read the query plan as such:
- Sequential Scan on table user_profile
- will most likely start returning rows immediately
- estimated disk page fetch count of 133369906.76
- estimated 30,901,176 rows to be returned
- estimated total row size of 28 bytes
- Window Aggregation on data from above operation
- will most likely start returning rows immediately
- estimated disk page fetch count of 133833424.40
- estimated 30,901,176 rows to be returned
- estimated total row size of 36 bytes