Postgres - select non-blank non-null values from multiple ordered rows-CodePudding

There are lots of data coming from multiple sources that I need to group based on priority, but the data quality from those sources is different - they may be missing some data. The task is to group that data into a separate table, in as complete as possible way.

For example:

create table grouped_data (
  id serial primary key,
  type text,
  a text,
  b text,
  c int
);

create table raw_data (
  id serial primary key,
  type text,
  a text,
  b text,
  c int,
  priority int
);


insert into raw_data
(type, a,       b,         c,   priority)
values
('one', null,    '',        123, 1),
('one', 'foo',   '',        456, 2),
('one', 'bar',   'baz',     789, 3),
('two', null,    'two-b',   11,  3),
('two', '',      '',        33,  2),
('two', null,    'two-bbb', 22,  1);

Now I need to group records by type, order by priority, take the first non-null and non-empty value, and put it into grouped_data. In this case, value of a for group one would be foo because the row that holds that value have a higher priority than the one with bar. And c should be 123, as it has the highest prio. Same for group two, for each column we take the data that is non-null, non-empty, and has the highest priority, or fallback to null if no actual data present.

In the end, grouped_data is expected to have the following content:

('one', 'foo', 'baz',     123),
('two', null,  'two-bbb', 22)

I've tried grouping, sub-selects, MERGE, cross joins... Alas, my knowledge of PostgreSQL is not good enough to get it working. One thing I'd like to avoid, too - is going through columns one-by-one, since in the real world there are few dozens of columns to work with...

A link to a fiddle I've been using to mess around with this: http://sqlfiddle.com/#!17/76699/1

CodePudding user response：

You should try this:

SELECT
       type,
       (array_agg(a ORDER BY priority ASC) FILTER (WHERE a IS NOT NULL AND a != ''))[1] as a,
       (array_agg(b ORDER BY priority ASC) FILTER (WHERE b IS NOT NULL AND b != ''))[1] as b,
       (array_agg(c ORDER BY priority ASC) FILTER (WHERE c IS NOT NULL))[1] as c
FROM raw_data GROUP BY type ORDER BY type;

CodePudding user response：

you can use window function first_value:

select distinct 
    type 
  , first_value(a) over (partition by type order by nullif(a,'') is null, priority) as a
  , first_value(b) over (partition by type order by nullif(b,'') is null, priority)  as b
  , first_value(c) over (partition by type order by priority) as c
from raw_data

CodePudding user response：

select distinct on (type) type, 
  first_value(a) over (partition by type order by (nullif(a, '') is null), priority) a, 
  first_value(b) over (partition by type order by (nullif(b, '') is null), priority) b, 
  first_value(c) over (partition by type order by (c is null), priority) c
from raw_data;

CodePudding user response：

This should also work.

WITH types(type) AS (
  SELECT DISTINCT
    type
  FROM raw_data
)
SELECT
  type,
  (SELECT a FROM raw_data WHERE a > '' AND raw_data.type = types.type ORDER BY priority LIMIT 1) AS a,
  (SELECT b FROM raw_data WHERE b > '' AND raw_data.type = types.type ORDER BY priority LIMIT 1) AS b,
  (SELECT c FROM raw_data WHERE c IS NOT NULL AND raw_data.type = types.type ORDER BY priority LIMIT 1) AS c
FROM types
ORDER BY type;