I have a question about how to using pandas groupby() function but I need to add a filter on it, please follow follow example df:
id c1 c2 c3
1 1 2 3
1 2 3 4
1 3 4 5
if I use code:
grouped_df = df.groupby(['id']).agg(
Min_1=('c1', np.min),
Min_2=('c2', np.min),
Min_3=('c2', np.min))
grouped_df.columns = ['gc1','gc2','gc3']
I will got following result:
id gc1 gc2 gc3
1 1 2 3
but I want add a filter when I do this group by, like if I use SQL to deal with this I will code like this:
select id
min(c1) as gc1,
min(c2) as gc2,
min(c3) as gc3
from df_view
where (c1 between 2 and 3)and (c2 between 2 and 3) and (c3 between 2 and 3)
group by id
so the result will be as follow:
id gc1 gc2 gc3
1 2 2 3
so how this can be done using pandas?
I search the documentation but no where mentioned I can add a filter when do aggregation on pandas.
CodePudding user response:
This does not use any additional pandas function, but one way to do this is to simply index the input to np.min directly. This would be similar to your where clause.
grouped_df = df.groupby(['id']).agg(
Min_1=('c1', lambda x : np.min(x[(x>=2) & (x<=3)])),
Min_2=('c2', lambda x : np.min(x[(x>=2) & (x<=3)])),
Min_3=('c3', lambda x : np.min(x[(x>=2) & (x<=3)])))
grouped_df.columns = ['gc1','gc2','gc3']
grouped_df