I have a PostGIS table that stores the locations a device has passed through in the following way:
ts | lat | lon |
---|---|---|
2022/12/23 10:04 | -3.2654 | -79.9234 |
2022/12/23 10:14 | -3.2654 | -79.9234 |
2022/12/23 10:24 | -2.95592 | -79.6833 |
As the device is transmitting every 10 minutes, there are a lot of duplicated locations and I want to filter the positions of the device to get the points that are separated at least by 10km of any other point the device has passed through.
I tried this SQL, and it filters the duplicates that are the exact same point, but I want to add the distance check to the other points of the table.
SELECT MIN(ts) AS arrivalTime, MAX(ts) AS departureTime, "lat", "lon" FROM "logsData"
WHERE "lat" IS NOT NULL
GROUP BY lat, lon
ORDER BY MAX(ts);
Is there a way to do this?
Thank you in advance
CodePudding user response:
Requirements
logsData
(ts
, lat
, lon
) can be expressed as gg1
, gg2
, gg3
.....ggN
. To report those data points that are 10km apart requires calculating distance as below:
distance(
gg1
,gg2
) >= 10km, reportgg1
,gg2
if not, calculate (gg1, ggX) until the distance >= 10km, then report
ggX
now use
ggX
as origin and find nextggY
so that distance(ggX
,ggY
) >= 10kmRepeat until all required data points are reported.
Before some SQL gurus come out an answer, here's a workaround.
Workaround
Calculate the distance between adjacent data points (in terms of 10-minute interval)
Calculate cumulative distance of each data point
Divide all data points into 10km segments: 0-10km, 10-20km, 20-30km....
Pick the first data point in each segment.
with cte_log_dist as (
select ts,
lat,
lon,
st_point(lon, lat) as gg,
lag(st_point(lon, lat),1) over (order by ts) as prev_gg,
coalesce(st_distance(lag(st_point(lon, lat),1) over (order by ts), st_point(lon, lat)),0) as distance
from logsdata),
cte_log_seg as (
select ts,
lat,
lon,
prev_gg,
gg,
distance,
sum(distance) over (order by ts) as total_distance,
floor(sum(distance) over (order by ts) / 10000) as ten_km_segment
from cte_log_dist),
cte_log_seg_rank as (
select ts,
lat,
lon,
prev_gg,
gg,
distance,
total_distance,
ten_m_segment,
row_number() over (partition by ten_km_segment) as rn
from cte_log_seg)
select *
from cte_log_seg_rank
where rn = 1;
The output is similar to this (Note the following test data are based on 10m segment for demonstration purpose)
ts |lat |lon |prev_gg |gg |distance |total_distance |ten_m_segment|rn|
----------------------- ---- ---- --------------------------------------------- --------------------------------------------- ------------------- ------------------ ------------- --
2022-01-01 00:00:00.000| 0.0|0.63| |POINT (0.6299999952316284 0) | 0.0| 0.0| 0.0| 1|
2022-01-01 00:50:00.000|0.01|3.03|POINT (0.2899999916553497 0.0099999997764826)|POINT (3.0299999713897705 0.0099999997764826)| 2.7399999797344208|12.300009636083564| 1.0| 1|
2022-01-01 02:00:00.000|0.02| 1.6|POINT (9.289999961853027 0.0099999997764826) |POINT (1.600000023841858 0.0199999995529652) | 7.6900064399587675| 26.25001606650559| 2.0| 1|
2022-01-01 02:10:00.000|0.02|5.79|POINT (1.600000023841858 0.0199999995529652) |POINT (5.789999961853027 0.0199999995529652) | 4.189999938011169| 30.44001600451676| 3.0| 1|
2022-01-01 03:00:00.000|0.03|0.57|POINT (9.539999961853027 0.0199999995529652) |POINT (0.5699999928474426 0.0299999993294477)| 8.970005543139631| 43.1600215476564| 4.0| 1|
2022-01-01 05:10:00.000|0.03|7.87|POINT (7.090000152587891 0.0299999993294477) |POINT (7.869999885559082 0.0299999993294477) | 0.7799997329711914| 50.46002144036804| 5.0| 1|
2022-01-01 05:50:00.000|0.04|8.95|POINT (6.559999942779541 0.0399999991059303) |POINT (8.949999809265137 0.0399999991059303) | 2.3899998664855957| 60.28003353049773| 6.0| 1|
2022-01-01 06:20:00.000|0.05|1.12|POINT (0.5299999713897705 0.0500000007450581)|POINT (1.1200000047683716 0.0500000007450581)| 0.5900000333786011| 70.39003935735751| 7.0| 1|
2022-01-01 09:00:00.000|0.06|0.02|POINT (9.5600004196167 0.0500000007450581) |POINT (0.0199999995529652 0.0599999986588955)| 9.540005661150024| 88.37004543335586| 8.0| 1|
2022-01-01 09:20:00.000|0.06|1.66|POINT (1.2599999904632568 0.0599999986588955)|POINT (1.659999966621399 0.0599999986588955) | 0.3999999761581421| 90.01004540042429| 9.0| 1|
2022-01-01 11:10:00.000|0.07|0.33|POINT (9.359999656677246 0.0599999986588955) |POINT (0.3300000131130219 0.0700000002980232)| 9.030005180663121|106.74005027114326| 10.0| 1|
CodePudding user response:
Run a self-join on the logsData so that there is a pairwise comparison of the locations, and then add a column to store the calculated distances. You may want to eliminate the duplicate locations before the self-join.
WITH
logsdata_merge (ts_a, lat_a, lon_a, ts_b, lat_b, lon_b)
AS
(
SELECT A.ts AS ts_a, A.lat AS lat_a, A.lon AS lon_a,
B.ts AS ts_b, B.lat AS lat_b, B.lon AS lon_b
FROM logsData A, logsData B
)
If unable to add a column to a CTE, create a new table instead of using a CTE. And then add the column for the distance values:
ALTER TABLE logsdata_merge ADD distance AS (...);
SELECT FROM logsdata_merge WHERE distance >= 10KM