Home > front end >  Choose row from next nearest neighbour where closest doesn't exist
Choose row from next nearest neighbour where closest doesn't exist

Time:01-23

I have three tables, one with temperature data with a date, one with buildings nearby the temperature stations, and one with unique ID's for the weather stations and the buildings, plus a ranking of 1, 2, 3 for the three nearest weather stations to the buildings. What I'd like to do is create a join that brings over the date and temperature from the nearest station to each building, and if the nearest station didn't record a temperature on that day then take the temperature from the next nearest weather station.

I'm pretty sure I'd use something like "Ranking = 1 AND NOT EXISTS" but I'm unsure how to tell it I want every date and if it can't find a certain date for a weather station then look at the next nearest one.

The ideal output is every buildings ID to have a temperature record for every date.

The weather table will occasionally be missing data for certain dates at certain sites so what I'd like to do is to go to the next nearest site when a date is missing from the closest site.

Building Table

UID TYPE DESC Loc
1 House Terrace London
2 Apartments 5 Floors Manchester
3 Office 12 Floors Birmingham
4 Workshop Detached Glasgow

Station Table

UID Date Temp Loc
1 01/01/2018 1 London
2 02/01/2018 1 London
3 03/01/2018 2 London
4 04/01/2018 1 London

Spatial Join Table

Build ID Station ID Distance Rank
1 1 5km 1
2 2 5km 1
3 3 12km 2
4 4 18km 3

CodePudding user response:

You have not described what you expect the results to be but for your sample data there is a 1-to-1-to-1 correspondence from building-to-spatial-join-to-station and you can simply use an INNER JOIN (there is no need to use any rankings):

SELECT *
FROM   building b
       INNER JOIN spatial_join sj
       ON (b."UID" = sj.build_id)
       INNER JOIN station s
       ON (sj.station_id = s."UID")

Which, for the sample data, outputs:

UID TYPE DESC LOC BUILD_ID STATION_ID DISTANCE RANK UID DATE TEMP LOC
1 House Terrace London 1 1 5km 1 1 2018-01-01 00:00:00 1 London
2 Apartments 5 Floors Manchester 2 2 5km 1 2 2018-01-02 00:00:00 1 London
3 Office 12 Floors Birmingham 3 3 12km 2 3 2018-01-03 00:00:00 2 London
4 Workshop Detached Glasgow 4 4 18km 3 4 2018-01-04 00:00:00 1 London

fiddle

CodePudding user response:

You could try to do something like here:
SAMPLE DATA

WITH
    buildings (B_ID, B_TYPE, B_DESC, B_LOC) AS
        (
            Select 1, 'House', 'Terrace', 'London' From Dual Union All
            Select 2, 'Appartments', '5 Floors', 'Manchester' From Dual Union All
            Select 3, 'Office', '12 Floors', 'Birmingham' From Dual Union All
            Select 4, 'Workshop', 'Detached', 'Glasgow' From Dual 
        ),
    stations (S_ID, S_DATE, S_TEMP, S_LOC) AS 
        (
            Select 1, DATE '2018-01-01', 1, 'London' From Dual Union All
            Select 2, DATE '2018-01-02', 1, 'London' From Dual Union All
            Select 3, DATE '2018-01-03', 2, 'London' From Dual Union All
            Select 4, DATE '2018-01-04', 1, 'London' From Dual 
        ),
    joins_tbl (B_ID, S_ID, DISTANCE, RANKED) AS
        (
            Select 1, 1, 5, 1 From Dual Union All
            Select 2, 2, 5, 1 From Dual Union All
            Select 3, 3, 12, 2 From Dual Union All
            Select 4, 4, 18, 3 From Dual 
        ),

First create CTE (I named it grid) that will construct rows having every building and in every date. The CTE also creates LOOKUP columns as an alternative station ID if self measuring didn't recorded. This part you should adjust to your data and preferences. Here is the grid:

    grid AS 
        (
            SELECT d.S_DATE "A_DATE", d.B_ID "BUILDING_ID", s.S_TEMP, s.S_ID,
                   (Select Min(S_ID) From joins_tbl Where B_ID <> d.B_ID And RANKED = 1) "LOOKUP_ID_1",
                   (Select Min(S_ID) From joins_tbl Where B_ID <> d.B_ID And RANKED = 2) "LOOKUP_ID_2",
                   (Select Min(S_ID) From joins_tbl Where B_ID <> d.B_ID And RANKED = 3) "LOOKUP_ID_3",
                   --
                   (Select MAX(S_ID) From joins_tbl Where B_ID <> d.B_ID And RANKED = 1) "LOOKUP_ID_4",
                   (Select MAX(S_ID) From joins_tbl Where B_ID <> d.B_ID And RANKED = 2) "LOOKUP_ID_5",
                   (Select MAX(S_ID) From joins_tbl Where B_ID <> d.B_ID And RANKED = 3) "LOOKUP_ID_6"
            FROM  (   Select  d.S_DATE, b.B_ID
                      From    stations d
                      Inner Join buildings b ON(1 = 1)
                  ) d
            INNER JOIN joins_tbl j ON(j.B_ID = d.B_ID)
            LEFT JOIN  stations s ON(s.S_DATE = d.S_DATE And s.S_ID = d.B_ID)
            
            ORDER BY d.B_ID, d.S_DATE
        )

--  R e s u l t :
A_DATE    BUILDING_ID     S_TEMP       S_ID LOOKUP_ID_1 LOOKUP_ID_2 LOOKUP_ID_3 LOOKUP_ID_4 LOOKUP_ID_5 LOOKUP_ID_6
--------- ----------- ---------- ---------- ----------- ----------- ----------- ----------- ----------- -----------
01-JAN-18           1          1          1           2           3           4           2           3           4 
02-JAN-18           1                                 2           3           4           2           3           4 
03-JAN-18           1                                 2           3           4           2           3           4 
04-JAN-18           1                                 2           3           4           2           3           4 
01-JAN-18           2                                 1           3           4           1           3           4 
02-JAN-18           2          1          2           1           3           4           1           3           4 
03-JAN-18           2                                 1           3           4           1           3           4 
04-JAN-18           2                                 1           3           4           1           3           4 
01-JAN-18           3                                 1                       4           2                       4 
02-JAN-18           3                                 1                       4           2                       4 
03-JAN-18           3          2          3           1                       4           2                       4 
04-JAN-18           3                                 1                       4           2                       4 
01-JAN-18           4                                 1           3                       2           3             
02-JAN-18           4                                 1           3                       2           3             
03-JAN-18           4                                 1           3                       2           3             
04-JAN-18           4          1          4           1           3                       2           3           

Now you could use COALESCE to get your missing data, and, if you want to know the station that is the source of displayed data, you could have it too. Here is the main SQL:

Select 
    A_DATE, BUILDING_ID, 
    COALESCE(S_TEMP,
              ( Select S_TEMP From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_1  ),
                ( Select S_TEMP From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_2  ),
                  ( Select S_TEMP From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_3  ),
              ( Select S_TEMP From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_4  ),
                ( Select S_TEMP From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_5  ),
                  ( Select S_TEMP From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_6  )
            ) "TEMP",
    COALESCE(CASE WHEN S_TEMP Is Not Null THEN 'SELF' END,
              ( Select 'LOOKUP_ID_1 - STATION ID = ' || S_ID From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_1  ),
                ( Select 'LOOKUP_ID_2 - STATION ID = ' || S_ID From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_2  ),
                  ( Select 'LOOKUP_ID_3 - STATION ID = ' || S_ID From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_3  ),
              ( Select 'LOOKUP_ID_4 - STATION ID = ' || S_ID From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_4  ),
                ( Select 'LOOKUP_ID_5 - STATION ID = ' || S_ID From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_5  ),
                  ( Select 'LOOKUP_ID_6 - STATION ID = ' || S_ID From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_6  )
            ) "SOURCE"
From 
    grid g

A_DATE    BUILDING_ID       TEMP SOURCE                                                            
--------- ----------- ---------- -------------------------------------------------------------------
01-JAN-18           1          1 SELF                                                                
02-JAN-18           1          1 LOOKUP_ID_1 - STATION ID = 2                                        
03-JAN-18           1          2 LOOKUP_ID_2 - STATION ID = 3                                        
04-JAN-18           1          1 LOOKUP_ID_3 - STATION ID = 4                                        
01-JAN-18           2          1 LOOKUP_ID_1 - STATION ID = 1                                        
02-JAN-18           2          1 SELF                                                                
03-JAN-18           2          2 LOOKUP_ID_2 - STATION ID = 3                                        
04-JAN-18           2          1 LOOKUP_ID_3 - STATION ID = 4                                        
01-JAN-18           3          1 LOOKUP_ID_1 - STATION ID = 1                                        
02-JAN-18           3          1 LOOKUP_ID_4 - STATION ID = 2                                        
03-JAN-18           3          2 SELF                                                                
04-JAN-18           3          1 LOOKUP_ID_3 - STATION ID = 4                                        
01-JAN-18           4          1 LOOKUP_ID_1 - STATION ID = 1                                        
02-JAN-18           4          1 LOOKUP_ID_4 - STATION ID = 2                                        
03-JAN-18           4          2 LOOKUP_ID_2 - STATION ID = 3                                        
04-JAN-18           4          1 SELF                                                   
  • Related