Home > Software engineering >  How to get all points in table that are separated by 10 km
How to get all points in table that are separated by 10 km

Time:12-27

I have a PostGIS table that stores the locations a device has passed through in the following way:

ts lat lon
2022/12/23 10:04 -3.2654 -79.9234
2022/12/23 10:14 -3.2654 -79.9234
2022/12/23 10:24 -2.95592 -79.6833

As the device is transmitting every 10 minutes, there are a lot of duplicated locations and I want to filter the positions of the device to get the points that are separated at least by 10km of any other point the device has passed through.

I tried this SQL, and it filters the duplicates that are the exact same point, but I want to add the distance check to the other points of the table.

SELECT  MIN(ts) AS arrivalTime, MAX(ts) AS departureTime,  "lat", "lon" FROM "logsData" 
  WHERE "lat" IS NOT NULL  
  GROUP BY lat, lon
  ORDER BY MAX(ts);

Is there a way to do this?

Thank you in advance

CodePudding user response:

Requirements

logsData (ts, lat, lon) can be expressed as gg1, gg2, gg3.....ggN. To report those data points that are 10km apart requires calculating distance as below:

  1. distance(gg1, gg2) >= 10km, report gg1, gg2

  2. if not, calculate (gg1, ggX) until the distance >= 10km, then report ggX

  3. now use ggX as origin and find next ggY so that distance(ggX, ggY) >= 10km

  4. Repeat until all required data points are reported.

Before some SQL gurus come out an answer, here's a workaround.

Workaround

  1. Calculate the distance between adjacent data points (in terms of 10-minute interval)

  2. Calculate cumulative distance of each data point

  3. Divide all data points into 10km segments: 0-10km, 10-20km, 20-30km....

  4. Pick the first data point in each segment.

with cte_log_dist as (
select ts,
       lat,
       lon,       
       st_point(lon, lat) as gg,
       lag(st_point(lon, lat),1) over (order by ts) as prev_gg,
       coalesce(st_distance(lag(st_point(lon, lat),1) over (order by ts), st_point(lon, lat)),0) as distance
  from logsdata),
cte_log_seg as (
select ts,
       lat,
       lon,
       prev_gg,
       gg,
       distance,
       sum(distance) over (order by ts)                 as total_distance,
       floor(sum(distance) over (order by ts) / 10000)  as ten_km_segment 
  from cte_log_dist),
cte_log_seg_rank as (
select ts, 
       lat, 
       lon, 
       prev_gg, 
       gg, 
       distance, 
       total_distance, 
       ten_m_segment,
       row_number() over (partition by ten_km_segment) as rn
  from cte_log_seg)
select *
  from cte_log_seg_rank
 where rn = 1;

The output is similar to this (Note the following test data are based on 10m segment for demonstration purpose)

ts                     |lat |lon |prev_gg                                      |gg                                           |distance           |total_distance    |ten_m_segment|rn|
----------------------- ---- ---- --------------------------------------------- --------------------------------------------- ------------------- ------------------ ------------- -- 
2022-01-01 00:00:00.000| 0.0|0.63|                                             |POINT (0.6299999952316284 0)                 |                0.0|               0.0|          0.0| 1|
2022-01-01 00:50:00.000|0.01|3.03|POINT (0.2899999916553497 0.0099999997764826)|POINT (3.0299999713897705 0.0099999997764826)| 2.7399999797344208|12.300009636083564|          1.0| 1|
2022-01-01 02:00:00.000|0.02| 1.6|POINT (9.289999961853027 0.0099999997764826) |POINT (1.600000023841858 0.0199999995529652) | 7.6900064399587675| 26.25001606650559|          2.0| 1|
2022-01-01 02:10:00.000|0.02|5.79|POINT (1.600000023841858 0.0199999995529652) |POINT (5.789999961853027 0.0199999995529652) |  4.189999938011169| 30.44001600451676|          3.0| 1|
2022-01-01 03:00:00.000|0.03|0.57|POINT (9.539999961853027 0.0199999995529652) |POINT (0.5699999928474426 0.0299999993294477)|  8.970005543139631|  43.1600215476564|          4.0| 1|
2022-01-01 05:10:00.000|0.03|7.87|POINT (7.090000152587891 0.0299999993294477) |POINT (7.869999885559082 0.0299999993294477) | 0.7799997329711914| 50.46002144036804|          5.0| 1|
2022-01-01 05:50:00.000|0.04|8.95|POINT (6.559999942779541 0.0399999991059303) |POINT (8.949999809265137 0.0399999991059303) | 2.3899998664855957| 60.28003353049773|          6.0| 1|
2022-01-01 06:20:00.000|0.05|1.12|POINT (0.5299999713897705 0.0500000007450581)|POINT (1.1200000047683716 0.0500000007450581)| 0.5900000333786011| 70.39003935735751|          7.0| 1|
2022-01-01 09:00:00.000|0.06|0.02|POINT (9.5600004196167 0.0500000007450581)   |POINT (0.0199999995529652 0.0599999986588955)|  9.540005661150024| 88.37004543335586|          8.0| 1|
2022-01-01 09:20:00.000|0.06|1.66|POINT (1.2599999904632568 0.0599999986588955)|POINT (1.659999966621399 0.0599999986588955) | 0.3999999761581421| 90.01004540042429|          9.0| 1|
2022-01-01 11:10:00.000|0.07|0.33|POINT (9.359999656677246 0.0599999986588955) |POINT (0.3300000131130219 0.0700000002980232)|  9.030005180663121|106.74005027114326|         10.0| 1|

CodePudding user response:

Run a self-join on the logsData so that there is a pairwise comparison of the locations, and then add a column to store the calculated distances. You may want to eliminate the duplicate locations before the self-join.

WITH
  logsdata_merge (ts_a, lat_a, lon_a, ts_b, lat_b, lon_b)
  AS
  (
    SELECT A.ts AS ts_a, A.lat AS lat_a, A.lon AS lon_a, 
           B.ts AS ts_b, B.lat AS lat_b, B.lon AS lon_b
    FROM logsData A, logsData B
  )

If unable to add a column to a CTE, create a new table instead of using a CTE. And then add the column for the distance values:

ALTER TABLE logsdata_merge ADD distance AS (...);

SELECT FROM logsdata_merge WHERE distance >= 10KM
  • Related