I would like to pivot the arrays in the cells inside the Postgres view pg_stats
. The reason for this is to better understand what is going on with the table EXPLAIN ANALYZE
commands.
I have this SQL query that I am modifying without success in order to achieve the pivot:
SELECT
-- ps.schemaname,
-- ps.tablename,
ps.attname as "column_name",
ps.null_frac,
ps.avg_width,
ps.n_distinct,
ps.most_common_vals,
ps.most_common_freqs,
ps.histogram_bounds,
ps.correlation -- , -- I think this means how close the stats are to the "real rows in the entire table"
-- unnest(ps.most_common_vals, ps.histogram_bounds)
FROM pg_stats AS ps
CROSS JOIN LATERAL unnest(ps.most_common_vals, ps.histogram_bounds)
WHERE
ps.tablename = 'my_table_name'
AND ps.attname IN ('my_important_column_1', 'my_important_column_2');
However I hit the wall on all sorts of error messages due to the function unnest(...)
(see https://www.postgresql.org/docs/9.4/functions-array.html):
- If I use the function
unnest(...)
in a column inside theSELECT
then I see this error:
SQL Error [42883]: ERROR: function unnest(anyarray, anyarray) does not exist
Hint: No function matches the given name and argument types. You might need to add explicit type casts.
Position: 325
- If I use
unnest(...)
in theFROM
with thatCROSS JOIN LATERAL
, then I see this error:
SQL Error [42804]: ERROR: cannot determine element type of "anyarray" argument
On the types issues, I am interested in these columns of various types "array" (see https://www.postgresql.org/docs/current/view-pg-stats.html):
pg_stats.most_common_vals
typeanyarray
pg_stats.most_common_freqs
typefloat4[]
pg_stats.histogram_bounds
typeanyarray
If I use all of these columns inside unnest(...)
then because types are different (basically because of float4[]
I think), I see this error:
SQL Error [42883]: ERROR: function unnest(anyarray, real[], anyarray) does not exist
Hint: No function matches the given name and argument types. You might need to add explicit type casts.
Position: 387
Or
SQL Error [42804]: ERROR: cannot determine element type of "anyarray" argument
depending on where I put unnest
(as column, or as filter).
- How do I pivot all these arrays inside column values?
- How do I use casting to apply this pivot to arrays of different types?
I am a bit lost with the Postgres syntax here.
EDIT:
I could do stuff like this in the SELECT
: array_to_string(ps.most_common_vals, E'\n') as most_common_vals_lf,
But this leads to "fat cells" that don't render very well in a visual IDE e.g. DBeaver because the line feeds don't expand very well and all the output rows hide most of the cell value just because they don't expand. I really would like to pivot the view and multiply the common columns by the number N of elements in the arrays and get back all these as N rows.
CodePudding user response:
Turns out the issue was around casting the types of those arrays. This works when making sure the arrays types are ::text::text[]
:
SELECT
-- ps.schemaname,
-- ps.tablename,
ps.attname as "column_name",
mt.mcv AS "most_common_vals",
mt.mcf AS "most_common_freqs",
ps.avg_width,
ps.n_distinct,
ps.correlation,
ps.null_frac -- fraction of column entries that are null
FROM pg_stats AS ps
-- for scalar type columns:
CROSS JOIN LATERAL unnest(
ps.most_common_vals::text::text[],
ps.most_common_freqs::text::text[]
) AS mt(mcv, mcf)
WHERE
ps.tablename = 'my_table_name'
AND ps.attname IN ('my_important_column_1', 'my_important_column_2')
AND mt.mcv IS NOT NULL
AND mt.mcf IS NOT NULL
ORDER BY ps.attname, mt.mcf DESC;
The output is the expected pivot of the original view pg_stats
with one row for each element of those arrays.
One gotcha on ps_stats
: the pairs of most_common_vals
and most_common_freqs
can be treated as a map. So in this case it makes sense to pivot the table with both pivoted columns.
On the other hand the histogram column histogram_bounds
is independent from the above. It just shows the 10 bands to go from 0% to 100% in steps of 10%. It doesn't make sense to mix this column with the 2 columns above. This histogram column very much depend on the data we are dealing with, sometimes it doesn't even make sense to take this information into consideration at all.