I have the following table:
id | timestamp | attribute-1 | attribute-2 |
---|---|---|---|
1 | 8/12 | A | |
1 | 8/13 | B | |
2 | 8/12 | A | |
2 | 8/13 | C | B |
2 | 8/14 | B | |
3 | 8/12 | B | |
3 | 8/14 | C | C |
And would like to create a new table with the most updated attributes for each id, which means the most recent row for each id as long as it has both att-1 and att-2, if it doesn't I want to take the att from a previous row. should be like that:
id | attribute-1 | attribute-2 |
---|---|---|
1 | B | A |
2 | B | B |
3 | C | C |
- The timestamp is a real psql timestamp
CodePudding user response:
You can use FIRST_VALUE()
window function:
SELECT DISTINCT id,
FIRST_VALUE(attribute1) OVER (PARTITION BY id ORDER BY attribute1 IS NULL, timestamp DESC) attribute1,
FIRST_VALUE(attribute2) OVER (PARTITION BY id ORDER BY attribute2 IS NULL, timestamp DESC) attribute2
FROM tablename;
See the demo.
CodePudding user response:
I created a table with following data as per your input
CREATE TABLE data (
"id" INTEGER,
"timestamp" DATE,
"attribute1" VARCHAR(1),
"attribute2" VARCHAR(1)
);
INSERT INTO data
("id", "timestamp", "attribute1", "attribute2")
VALUES
('1', '2021-08-12', null, 'A'),
('1', '2021-08-13', 'B', null),
('2', '2021-08-12', null, 'A'),
('2', '2021-08-13', 'C', 'B'),
('2', '2021-08-14', 'B', null),
('3', '2021-08-12', 'B', null),
('3', '2021-08-14', 'C', 'C');
I think you can achieve your result by aggregating data and picking the first result in that:
SELECT
id, MAX(timestamp) AS timestamp_max,
(array_remove(array_agg(attribute1 ORDER BY timestamp DESC), NULL))[1] AS attribute1_agg,
(array_remove(array_agg(attribute2 ORDER BY timestamp DESC), NULL))[1] AS attribute1_agg
FROM data
GROUP BY id
ORDER BY id ASC;
which gives this output:
id | timestamp_max | attribute1_agg | attribute1_agg |
---|---|---|---|
1 | 2021-08-13T00:00:00.000Z | B | A |
2 | 2021-08-14T00:00:00.000Z | B | B |
3 | 2021-08-14T00:00:00.000Z | C | C |