I have three tables, one with temperature data with a date, one with buildings nearby the temperature stations, and one with unique ID's for the weather stations and the buildings, plus a ranking of 1, 2, 3 for the three nearest weather stations to the buildings. What I'd like to do is create a join that brings over the date and temperature from the nearest station to each building, and if the nearest station didn't record a temperature on that day then take the temperature from the next nearest weather station.
I'm pretty sure I'd use something like "Ranking = 1 AND NOT EXISTS" but I'm unsure how to tell it I want every date and if it can't find a certain date for a weather station then look at the next nearest one.
The ideal output is every buildings ID to have a temperature record for every date.
The weather table will occasionally be missing data for certain dates at certain sites so what I'd like to do is to go to the next nearest site when a date is missing from the closest site.
Building Table
UID | TYPE | DESC | Loc |
---|---|---|---|
1 | House | Terrace | London |
2 | Apartments | 5 Floors | Manchester |
3 | Office | 12 Floors | Birmingham |
4 | Workshop | Detached | Glasgow |
Station Table
UID | Date | Temp | Loc |
---|---|---|---|
1 | 01/01/2018 | 1 | London |
2 | 02/01/2018 | 1 | London |
3 | 03/01/2018 | 2 | London |
4 | 04/01/2018 | 1 | London |
Spatial Join Table
Build ID | Station ID | Distance | Rank |
---|---|---|---|
1 | 1 | 5km | 1 |
2 | 2 | 5km | 1 |
3 | 3 | 12km | 2 |
4 | 4 | 18km | 3 |
CodePudding user response:
You have not described what you expect the results to be but for your sample data there is a 1-to-1-to-1 correspondence from building-to-spatial-join-to-station and you can simply use an INNER JOIN
(there is no need to use any rankings):
SELECT *
FROM building b
INNER JOIN spatial_join sj
ON (b."UID" = sj.build_id)
INNER JOIN station s
ON (sj.station_id = s."UID")
Which, for the sample data, outputs:
UID | TYPE | DESC | LOC | BUILD_ID | STATION_ID | DISTANCE | RANK | UID | DATE | TEMP | LOC |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | House | Terrace | London | 1 | 1 | 5km | 1 | 1 | 2018-01-01 00:00:00 | 1 | London |
2 | Apartments | 5 Floors | Manchester | 2 | 2 | 5km | 1 | 2 | 2018-01-02 00:00:00 | 1 | London |
3 | Office | 12 Floors | Birmingham | 3 | 3 | 12km | 2 | 3 | 2018-01-03 00:00:00 | 2 | London |
4 | Workshop | Detached | Glasgow | 4 | 4 | 18km | 3 | 4 | 2018-01-04 00:00:00 | 1 | London |
CodePudding user response:
You could try to do something like here:
SAMPLE DATA
WITH
buildings (B_ID, B_TYPE, B_DESC, B_LOC) AS
(
Select 1, 'House', 'Terrace', 'London' From Dual Union All
Select 2, 'Appartments', '5 Floors', 'Manchester' From Dual Union All
Select 3, 'Office', '12 Floors', 'Birmingham' From Dual Union All
Select 4, 'Workshop', 'Detached', 'Glasgow' From Dual
),
stations (S_ID, S_DATE, S_TEMP, S_LOC) AS
(
Select 1, DATE '2018-01-01', 1, 'London' From Dual Union All
Select 2, DATE '2018-01-02', 1, 'London' From Dual Union All
Select 3, DATE '2018-01-03', 2, 'London' From Dual Union All
Select 4, DATE '2018-01-04', 1, 'London' From Dual
),
joins_tbl (B_ID, S_ID, DISTANCE, RANKED) AS
(
Select 1, 1, 5, 1 From Dual Union All
Select 2, 2, 5, 1 From Dual Union All
Select 3, 3, 12, 2 From Dual Union All
Select 4, 4, 18, 3 From Dual
),
First create CTE (I named it grid) that will construct rows having every building and in every date. The CTE also creates LOOKUP columns as an alternative station ID if self measuring didn't recorded. This part you should adjust to your data and preferences. Here is the grid:
grid AS
(
SELECT d.S_DATE "A_DATE", d.B_ID "BUILDING_ID", s.S_TEMP, s.S_ID,
(Select Min(S_ID) From joins_tbl Where B_ID <> d.B_ID And RANKED = 1) "LOOKUP_ID_1",
(Select Min(S_ID) From joins_tbl Where B_ID <> d.B_ID And RANKED = 2) "LOOKUP_ID_2",
(Select Min(S_ID) From joins_tbl Where B_ID <> d.B_ID And RANKED = 3) "LOOKUP_ID_3",
--
(Select MAX(S_ID) From joins_tbl Where B_ID <> d.B_ID And RANKED = 1) "LOOKUP_ID_4",
(Select MAX(S_ID) From joins_tbl Where B_ID <> d.B_ID And RANKED = 2) "LOOKUP_ID_5",
(Select MAX(S_ID) From joins_tbl Where B_ID <> d.B_ID And RANKED = 3) "LOOKUP_ID_6"
FROM ( Select d.S_DATE, b.B_ID
From stations d
Inner Join buildings b ON(1 = 1)
) d
INNER JOIN joins_tbl j ON(j.B_ID = d.B_ID)
LEFT JOIN stations s ON(s.S_DATE = d.S_DATE And s.S_ID = d.B_ID)
ORDER BY d.B_ID, d.S_DATE
)
-- R e s u l t :
A_DATE BUILDING_ID S_TEMP S_ID LOOKUP_ID_1 LOOKUP_ID_2 LOOKUP_ID_3 LOOKUP_ID_4 LOOKUP_ID_5 LOOKUP_ID_6
--------- ----------- ---------- ---------- ----------- ----------- ----------- ----------- ----------- -----------
01-JAN-18 1 1 1 2 3 4 2 3 4
02-JAN-18 1 2 3 4 2 3 4
03-JAN-18 1 2 3 4 2 3 4
04-JAN-18 1 2 3 4 2 3 4
01-JAN-18 2 1 3 4 1 3 4
02-JAN-18 2 1 2 1 3 4 1 3 4
03-JAN-18 2 1 3 4 1 3 4
04-JAN-18 2 1 3 4 1 3 4
01-JAN-18 3 1 4 2 4
02-JAN-18 3 1 4 2 4
03-JAN-18 3 2 3 1 4 2 4
04-JAN-18 3 1 4 2 4
01-JAN-18 4 1 3 2 3
02-JAN-18 4 1 3 2 3
03-JAN-18 4 1 3 2 3
04-JAN-18 4 1 4 1 3 2 3
Now you could use COALESCE to get your missing data, and, if you want to know the station that is the source of displayed data, you could have it too. Here is the main SQL:
Select
A_DATE, BUILDING_ID,
COALESCE(S_TEMP,
( Select S_TEMP From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_1 ),
( Select S_TEMP From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_2 ),
( Select S_TEMP From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_3 ),
( Select S_TEMP From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_4 ),
( Select S_TEMP From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_5 ),
( Select S_TEMP From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_6 )
) "TEMP",
COALESCE(CASE WHEN S_TEMP Is Not Null THEN 'SELF' END,
( Select 'LOOKUP_ID_1 - STATION ID = ' || S_ID From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_1 ),
( Select 'LOOKUP_ID_2 - STATION ID = ' || S_ID From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_2 ),
( Select 'LOOKUP_ID_3 - STATION ID = ' || S_ID From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_3 ),
( Select 'LOOKUP_ID_4 - STATION ID = ' || S_ID From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_4 ),
( Select 'LOOKUP_ID_5 - STATION ID = ' || S_ID From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_5 ),
( Select 'LOOKUP_ID_6 - STATION ID = ' || S_ID From stations Where S_DATE = g.A_DATE And S_ID = g.LOOKUP_ID_6 )
) "SOURCE"
From
grid g
A_DATE BUILDING_ID TEMP SOURCE
--------- ----------- ---------- -------------------------------------------------------------------
01-JAN-18 1 1 SELF
02-JAN-18 1 1 LOOKUP_ID_1 - STATION ID = 2
03-JAN-18 1 2 LOOKUP_ID_2 - STATION ID = 3
04-JAN-18 1 1 LOOKUP_ID_3 - STATION ID = 4
01-JAN-18 2 1 LOOKUP_ID_1 - STATION ID = 1
02-JAN-18 2 1 SELF
03-JAN-18 2 2 LOOKUP_ID_2 - STATION ID = 3
04-JAN-18 2 1 LOOKUP_ID_3 - STATION ID = 4
01-JAN-18 3 1 LOOKUP_ID_1 - STATION ID = 1
02-JAN-18 3 1 LOOKUP_ID_4 - STATION ID = 2
03-JAN-18 3 2 SELF
04-JAN-18 3 1 LOOKUP_ID_3 - STATION ID = 4
01-JAN-18 4 1 LOOKUP_ID_1 - STATION ID = 1
02-JAN-18 4 1 LOOKUP_ID_4 - STATION ID = 2
03-JAN-18 4 2 LOOKUP_ID_2 - STATION ID = 3
04-JAN-18 4 1 SELF