Home > Software engineering >  PostgreSQL data transformation - Turn rows into columns
PostgreSQL data transformation - Turn rows into columns

Time:10-18

I have a table whose structure looks like the following:

k | i | p | v

Notice that the key (k) is not unique, there are no keys, nothing. Each key can have multiple attributes (i = 0, 1, 2, ...) which can be of different types (p) and have different values (v). One attribute type may also appear multiple times (p(i-1) = p(i)).

What I want to do is pick certain attribute types and their corresponding values and place them in the same row. For example I want to have:

k | attr_name1 | attr_name2

I have managed to make a query that does this and works for all keys (k) for which attr_name1 and attr_name2 appear in the column p of the initial table:

SELECT DISTINCT ON (key) fn.k AS key, fn.v AS attr_name1, a.v AS attr_name2 
FROM Table fn 
  LEFT JOIN Table a ON fn.k = a.k 
              AND a.p = 'attr_name2' 
WHERE fn.p = 'attr_name1'

I would like, however, to take into account the case where a certain key has no attribute named attr_name1 and insert a NULL value into the corresponding column of the new table. I am not sure how to achieve that. I have no issue using multiple queries or intermediate tables etc, but there are quite a lot of rows in the table and I need something that scales to millions of rows.

Any help would be appreciated.

Example:

k i p v
1 0 a 10
1 1 b 12
1 2 c 34
1 3 d 44
1 4 e 09
2 0 a 11
2 1 b 13
2 2 d 22
2 3 f 34

Would turn into (assuming I am only interested in columns a, b, c):

k a  b  c
1 10 12 34
2 11 13 NULL

CodePudding user response:

I would use conditional aggregation. That is, an aggregate function around a CASE expression.

SELECT
  k,
  MAX(CASE WHEN p='a' THEN v END)   AS a,
  MAX(CASE WHEN p='b' THEN v END)   AS b,
  MAX(CASE WHEN p='c' THEN v END)   AS c
FROM
  your_table
GROUP BY
  k

This presumes that (k, p) is unique. If there are duplicate keys, this will clearly find the one v with the highest value (for each (k,p))

As a general rule this kind of pivoting makes the data harder to process in SQL. This is often done for display purposes because humans find this easier to read. However, from a software engineering perspective, such formatting should not be done in the data layer; be careful that by doing this you don't actually make your future life harder.

  • Related