I have in my postgres database, the following table:
CREATE TABLE trips(id int, lat_init double precision, lon_init double precision,
lat_end double precision, lon_end double precision);
INSERT INTO trips(id, lat_init, lon_init, lat_end, lon_end)
VALUES (1,53.348060,-1.504068,53.360690,-1.457364),
(2,53.427651,-1.355329,53.380441,-1.536918),
(3,53.365660,-1.497837,53.363838,-1.388023),
(4,53.380349,-1.460028,53.429298,-1.359443),
(5,53.345526,-1.449195,53.407839,-1.501819);
So that (considering the example table values above):
SELECT * FROM trips;
id lat_init lon_init lat_end lon_end
1 53.34806 -1.504068 53.36069 -1.457364
2 53.427651 -1.355329 53.380441 -1.536918
3 53.36566 -1.497837 53.363838 -1.388023
4 53.380349 -1.460028 53.429298 -1.359443
5 53.345526 -1.449195 53.407839 -1.501819
In SQL, I can filter rows that start or end outside the bounding box defined by the coordinates (row 2 and 4):
- Upper Left: (53.42361, -1.57495),
- Lower Right: (53.32037, -1.38063)
Using:
select t.*
from trips t
where lat_init >= 53.32037 and lat_init <= 53.42361 and
lat_end >= 53.32037 and lat_end <= 53.42361 and
lon_init >= -1.57495 and lon_init <= -1.38063 and
lon_end >= -1.57495 and lon_end <= -1.38063;
id lat_init lon_init lat_end lon_end
1 53.34806 -1.504068 53.36069 -1.457364
3 53.36566 -1.497837 53.363838 -1.388023
5 53.345526 -1.449195 53.407839 -1.501819
I want to apply this filtering procedure in pandas. The trips
table is exported to a csv file called my-trips.csv
.
So that:
df = pd.read_csv('my-trips.csv')
df
id lat_init lon_init lat_end lon_end
0 1 53.348060 -1.504068 53.360690 -1.457364
1 2 53.427651 -1.355329 53.380441 -1.536918
2 3 53.365660 -1.497837 53.363838 -1.388023
3 4 53.380349 -1.460028 53.429298 -1.359443
4 5 53.345526 -1.449195 53.407839 -1.501819
How then do I perform the filtering as I do in the SQL above?
CodePudding user response:
Option 1
One way to do it is as follows
conditions = [(df['lat_init'] >= 53.32037) & (df['lat_init'] <= 53.42361),
(df['lat_end'] >= 53.32037) & (df['lat_end'] <= 53.42361),
(df['lon_init'] >= -1.57495) & (df['lon_init'] <= -1.38063),
(df['lon_end'] >= -1.57495) & (df['lon_end'] <= -1.38063)]
df_new = df[conditions[0] & conditions[1] & conditions[2] & conditions[3]]
[Out]:
id lat_init lon_init lat_end lon_end
0 1 53.348060 -1.504068 53.360690 -1.457364
2 3 53.365660 -1.497837 53.363838 -1.388023
4 5 53.345526 -1.449195 53.407839 -1.501819
A one-liner would look like the following
df_new = df[(df['lat_init'] >= 53.32037) & (df['lat_init'] <= 53.42361) & (df['lat_end'] >= 53.32037) & (df['lat_end'] <= 53.42361) & (df['lon_init'] >= -1.57495) & (df['lon_init'] <= -1.38063) & (df['lon_end'] >= -1.57495) & (df['lon_end'] <= -1.38063)]
[Out]:
id lat_init lon_init lat_end lon_end
0 1 53.348060 -1.504068 53.360690 -1.457364
2 3 53.365660 -1.497837 53.363838 -1.388023
4 5 53.345526 -1.449195 53.407839 -1.501819
Option 2
One can also use dataframe.query
as follows
df_new = df.query('lat_init >= 53.32037 and lat_init <= 53.42361 and lat_end >= 53.32037 and lat_end <= 53.42361 and lon_init >= -1.57495 and lon_init <= -1.38063 and lon_end >= -1.57495 and lon_end <= -1.38063')
[Out]:
id lat_init lon_init lat_end lon_end
0 1 53.348060 -1.504068 53.360690 -1.457364
2 3 53.365660 -1.497837 53.363838 -1.388023
4 5 53.345526 -1.449195 53.407839 -1.501819
CodePudding user response:
I post the code below should work:
import pandas as pd
df = pd.read_clipboard()
df = df[
df['lat_init'].between(53.32037, 53.42361, inclusive='both') &
df['lon_init'].between(-1.57495, -1.38063, inclusive='both') &
df['lat_end'].between(53.32037, 53.42361, inclusive='both') &
df['lon_end'].between(-1.57495, -1.38063, inclusive='both')
]
print(df)
id lat_init lon_init lat_end lon_end
0 1 53.348060 -1.504068 53.360690 -1.457364
2 3 53.365660 -1.497837 53.363838 -1.388023
4 5 53.345526 -1.449195 53.407839 -1.501819