Home > OS >  Using sed (or other GNU/Linux tool) to find GPS within a bounding box?
Using sed (or other GNU/Linux tool) to find GPS within a bounding box?

Time:03-07

I'm looking to filter from a very large csv file down to a smaller one using a broad stroke command line tool.

The example data is here:

2021-03-19 09:37:00,LISBON,39.1660,-9.5114,18.5600,60.3886
2021-03-19 09:38:00,LISBON,38.8799,-9.3713,19.1051,27.9254
2021-03-19 09:39:00,LISBON,38.5964,-8.8315,19.1044,29.2456
2021-03-19 09:40:00,LISBON,38.4241,-8.9433,18.1184,35.7412
2021-03-19 09:41:00,LISBON,38.8015,-8.6765,17.7960,41.2380
2021-03-19 09:42:00,LISBON,38.4844,-9.0106,19.4660,27.1470
2021-03-19 09:43:00,LISBON,38.3213,-8.9620,19.7043,45.5808
2021-03-19 09:44:00,LISBON,38.9479,-9.1680,19.0704,26.8376
^C21-03-19 09:45:00,LISBON,37.9198,-9.2775,17.8219,88.4726

The third and fourth fields here are GPS coordinates.

I'd like to be able to filter them down to within ~25 km of a central point 38.7077507, -9.1365919 and sed is very effective for this.

For example - sed -n '/38.7[2-4]..,-9.1[3-7]../p' gets pretty close.

HOWEVER, I'd like to make the 'bounding box' bigger, and this is where things get a bit confusing. For example, let's say i wanted to spread the longitude all the way down to -8.9. How do you write a regex for this?

I tried something like sed -n '/38.7[2-4]..,-[8-9]...../p', but the problem is that this returns '-8.1' which is too far, when I want to stop it at '-8.9'.

I know that if I got it into a richer language (e.g. Python) this is pretty straightforward, but I'd like to do as much on the front end (before I injest into the data pipeline), and sed is extremely performant for this.

Thanks!

CodePudding user response:

Wouldn't want to abuse sed for this, so here's an awk solution.

awk -F, '{x=38.7077507-$3; y=-9.1365919-$4; if(x^2 y^2<0.3^2) print}' input.txt
#           ^~~~~~~~~~ x     ^~~~~~~~~~ y              ^~~ r
  • Related