Home > OS >  Calculate Avg in for loop for columns in a table in PostgreSQL
Calculate Avg in for loop for columns in a table in PostgreSQL

Time:03-03

I come from the Python world, where many things are colorful and easy. Now I'm trying to make my way into SQL, because well, I want to challenge myself outside of pandas, and gain the important experience in SQL. That said, I have the following question. I have the following snippet:

do 
$do$
declare i varchar(50); 
declare average int; 
begin
    for i in (
        select column_name
        FROM information_schema.columns
        where table_schema = 'public'
        and table_name = 'example_table' 
        and column_name like '%suffix') loop 
            --raise notice 'Value: %', i; 
            select AVG(i) as average from example_table; 
            raise notice 'Value: %', i;
        end loop; 
end; 
$do$

As I learned in the documentation for SQL, I found that for loops are only possible in a do block, and that certain variables have to be declared. I did this for the i variable which contains the name of the column I want to iterate. But I want to get the average of the column and add it as a row in a table with two columns one for the feature (i variable), and the average for this column. I thought that would be possible with my code snippet above, but I receive an error message that says that Function avg(character varying) does not exist. When I use the function AVG outside of a for loop for a single column, it does retrieve the average value of this numeric column, but when I do it in a for loop, says that this aggregate function does not exists. Could someone help me out with this please?

UPDATE: I was taking a step back and trying to make the story shorter:

select column_name
        FROM information_schema.columns
        where table_schema = 'public'
        and table_name = 'my_table' 
        and column_name like '%wildcard'; 

This snippet yields a table with a column called column_name and all the columns that fullfil the constraints stated in the where statement. I just want to add a column with the average value of those columns.

CodePudding user response:

If you only need it for a single table, you can use:

select x.col, avg(x.value::numeric)
from example_table t
 cross join lateral (
    select col, value
    from jsonb_each(to_jsonb(t)) as e(col, value)
    where jsonb_typeof(e.value) = 'number'
 ) x
group by x.col;

The "magic" is in converting each row from the table into a JSON value. This is what to_jsonb(t) does (t is the alias given to the table in the main query). So we get something like {"name": "Bla", "value": 3.14, "length": 10, "some_date": "2022-03-02"}. So each column name is a key in the JSON value.

This json is then turned into one row per column (=key) using the jsonb_each() function but only rows (=columns) that have a number value are retained. So the derived table returns one row per column and row in the table. The outer query then simply aggregates this per column. The drawback is, you need to write one query for each table.


If you need some kind of report for all tables in a schema, you can use a variation of this answer

with all_selects as (
  select table_schema, table_name, 'select '||string_agg(format('avg(%I) as %I', column_name, column_name), ', ')||format(' from %I.%I', table_schema, table_name) as query
  from information_schema.columns
  where table_schema = 'public'
    and data_type in ('bigint', 'integer', 'double precision', 'smallint', 'numeric', 'real')
  group by table_schema, table_name
), all_aggregates as (
   select table_schema, table_name, 
          query_to_xml(query, true, true, '') as result
   from all_selects
)
select ag.table_schema, ag.table_name, r.column_name, nullif(r.average, '')::numeric as average
from all_aggregates ag
  cross join xmltable('/row/*' passing result
     columns column_name text path 'local-name()', 
             average text path '.') as r

This is a bit more tricky. The first part all_selects builds a query for each table in the schema public to apply the avg() aggregate on each column that can contain a number (where data type in (...))

So e.g. this returns a string select avg(value) as value, avg(length) as length from example_table

The next step is running each of these queries through query_to_xml() (sadly there is no built-in query_to_jsonb()).

query_to_xml() would return something like:

<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <value>12.345</balance>
  <length>42</user_id>
</row>

So one tag for each column (which is the result of the avg(..) function).

The final select then uses xmltable() to turn each tag from the XML result into a row returning the column name and value

Online example


Of course you can do this in PL/pgSQL as well:

do 
$do$
declare 
  l_rec record;
  l_sql text;
  l_average numeric;
begin
    for l_rec in 
        select table_schema, table_name, column_name
        from information_schema.columns
        where table_schema = 'public'
          and data_type in ('bigint', 'integer', 'double precision', 'smallint', 'numeric', 'real')
    loop 
      l_sql := format('select avg(%I) from %I.%I', l_rec.column_name, l_rec.table_schema, l_rec.table_name);
      execute l_sql
         into l_average;
      raise notice 'Average for %.% is: %', l_rec.table_name, l_rec.column_name, l_average;
    end loop; 
end; 
$do$

Note condition on the column data_type to only process columns that can be averaged. This is however more costly as it runs one query per column, not per table.

  • Related