Home > Net >  SQL to Pandas data filter equivalent
SQL to Pandas data filter equivalent

Time:10-11

I have in my postgres database, the following table:

CREATE TABLE trips(id int, lat_init double precision, lon_init double precision,
lat_end double precision, lon_end double precision);

INSERT INTO trips(id, lat_init, lon_init, lat_end, lon_end)
VALUES (1,53.348060,-1.504068,53.360690,-1.457364),
(2,53.427651,-1.355329,53.380441,-1.536918),
(3,53.365660,-1.497837,53.363838,-1.388023),
(4,53.380349,-1.460028,53.429298,-1.359443),
(5,53.345526,-1.449195,53.407839,-1.501819);

So that (considering the example table values above):

SELECT * FROM  trips;
id  lat_init     lon_init    lat_end      lon_end
1   53.34806    -1.504068   53.36069    -1.457364
2   53.427651   -1.355329   53.380441   -1.536918
3   53.36566    -1.497837   53.363838   -1.388023
4   53.380349   -1.460028   53.429298   -1.359443
5   53.345526   -1.449195   53.407839   -1.501819

In SQL, I can filter rows that start or end outside the bounding box defined by the coordinates (row 2 and 4):

  • Upper Left: (53.42361, -1.57495),
  • Lower Right: (53.32037, -1.38063)

Using:

select t.*
from trips t
where lat_init >= 53.32037 and lat_init <= 53.42361 and
      lat_end >= 53.32037 and lat_end <= 53.42361 and
      lon_init >= -1.57495 and lon_init <= -1.38063 and
     lon_end >= -1.57495 and lon_end <= -1.38063;

id  lat_init     lon_init    lat_end     lon_end
1   53.34806    -1.504068   53.36069    -1.457364
3   53.36566    -1.497837   53.363838   -1.388023
5   53.345526   -1.449195   53.407839   -1.501819

I want to apply this filtering procedure in pandas. The trips table is exported to a csv file called my-trips.csv.

So that:

df = pd.read_csv('my-trips.csv')
df
    id   lat_init    lon_init    lat_end      lon_end
0   1   53.348060   -1.504068   53.360690   -1.457364
1   2   53.427651   -1.355329   53.380441   -1.536918
2   3   53.365660   -1.497837   53.363838   -1.388023
3   4   53.380349   -1.460028   53.429298   -1.359443
4   5   53.345526   -1.449195   53.407839   -1.501819

How then do I perform the filtering as I do in the SQL above?

CodePudding user response:

Option 1

One way to do it is as follows

conditions = [(df['lat_init'] >= 53.32037) & (df['lat_init'] <= 53.42361),
                (df['lat_end'] >= 53.32037) & (df['lat_end'] <= 53.42361),
                (df['lon_init'] >= -1.57495) & (df['lon_init'] <= -1.38063),
                (df['lon_end'] >= -1.57495) & (df['lon_end'] <= -1.38063)]

df_new = df[conditions[0] & conditions[1] & conditions[2] & conditions[3]]

[Out]:

   id   lat_init  lon_init    lat_end   lon_end
0   1  53.348060 -1.504068  53.360690 -1.457364
2   3  53.365660 -1.497837  53.363838 -1.388023
4   5  53.345526 -1.449195  53.407839 -1.501819

A one-liner would look like the following

df_new = df[(df['lat_init'] >= 53.32037) & (df['lat_init'] <= 53.42361) & (df['lat_end'] >= 53.32037) & (df['lat_end'] <= 53.42361) & (df['lon_init'] >= -1.57495) & (df['lon_init'] <= -1.38063) & (df['lon_end'] >= -1.57495) & (df['lon_end'] <= -1.38063)]

[Out]:

   id   lat_init  lon_init    lat_end   lon_end
0   1  53.348060 -1.504068  53.360690 -1.457364
2   3  53.365660 -1.497837  53.363838 -1.388023
4   5  53.345526 -1.449195  53.407839 -1.501819

Option 2

One can also use dataframe.query as follows

df_new = df.query('lat_init >= 53.32037 and lat_init <= 53.42361 and lat_end >= 53.32037 and lat_end <= 53.42361 and lon_init >= -1.57495 and lon_init <= -1.38063 and lon_end >= -1.57495 and lon_end <= -1.38063')

[Out]:

   id   lat_init  lon_init    lat_end   lon_end
0   1  53.348060 -1.504068  53.360690 -1.457364
2   3  53.365660 -1.497837  53.363838 -1.388023
4   5  53.345526 -1.449195  53.407839 -1.501819

CodePudding user response:

I post the code below should work:

import pandas as pd
df = pd.read_clipboard()
df = df[
    df['lat_init'].between(53.32037, 53.42361, inclusive='both') &
    df['lon_init'].between(-1.57495, -1.38063, inclusive='both') &
    df['lat_end'].between(53.32037, 53.42361, inclusive='both') &
    df['lon_end'].between(-1.57495, -1.38063, inclusive='both')
    ]
print(df)


   id   lat_init  lon_init    lat_end   lon_end
0   1  53.348060 -1.504068  53.360690 -1.457364
2   3  53.365660 -1.497837  53.363838 -1.388023
4   5  53.345526 -1.449195  53.407839 -1.501819
  • Related