Home > other >  Postgresql Joining Time Series With Missing Data
Postgresql Joining Time Series With Missing Data

Time:08-03

I am trying to outer join multiple time series tables in PostgreSQL on multiple conditions - which include the date column and several other identifier columns.

However the tables do not have continuous time series i.e. some dates are missing for some of the join conditions. Furthermore I don't want "duplicate" table specific new columns to be added for a row when there is not match

I have tried COALESCE() on the dates to fill in missing dates, which is fine. However it is the subsequent joins that are causing me problems. I also can't assume that one of the tables will have rows for all the dates required.

I thought perhaps to use generate series for a date range, with empty columns (? if possible) and then join all the tables on to that?

Please see example below:

I want to join Table A and Table B on columns date, identifier_1 and identifier_2 as an outer join. However where a value is not matched I do not want new columns to be added e.g. table_b_identifier_1.

Table A

id1 and id2 are missing rows on the 03/07 and 04/07, and id1 is also missing a row for the 05/07.

enter image description here

Table B

id2 is missing a row on the 02/07

enter image description here

Desired Output:

Essentially it is a conditional join. If there is a row in both tables for identifier_1 and

enter image description here

CodePudding user response:

It's not clear what is wrong with your attempt to use COALESCE to fill columns data, but it works well as intended in such a query

SELECT 
    COALESCE(a.date, b.date) AS date,
    COALESCE(a.identifier_1, b.identifier_1) AS identifier_1,
    COALESCE(a.identifier_2, b.identifier_2) AS identifier_2,
    a.value_a,
    b.value_b
FROM table_a a
FULL JOIN table_b b ON a.date = b.date 
                       AND a.identifier_1 = b.identifier_1 
                       AND a.identifier_2 = b.identifier_2

Please, check a demo

  • Related