Home > Mobile >  pandas drop rows based on condition on groupby
pandas drop rows based on condition on groupby

Time:11-19

I have a DataFrame like below

enter image description here

I am trying to groupby cell column and drop the "NA" values where group size > 1. required Output :

enter image description here

How to get my expected output? How to filter on a condition and drop rows in groupby statement?

CodePudding user response:

Use boolean mask:

>>> df[df.groupby('cell').cumcount().eq(0) | df['value'].notna()]

  cell value   kpi
0    A  crud  thpt
1    A     6   ret
3    B   NaN   acc
4    D    hi   int

Details:

m1 = df.groupby('cell').cumcount().eq(0)
m2 = df['value'].notna()
df.assign(keep_at_least_one=m1, keep_notna=m2, keep_rows=m1|m2)

# Output:
  cell value   kpi  keep_at_least_one  keep_notna  keep_rows
0    A  crud  thpt               True        True       True
1    A     6   ret              False        True       True
2    A   NaN  thpt              False       False      False
3    B   NaN   acc               True       False       True
4    D    hi   int               True        True       True
5    D   NaN    ps              False       False      False
6    D   NaN  yret              False       False      False

CodePudding user response:

From your DataFrame, first we group by cell to get the size of each groups :

>>> df_grouped = df.groupby(['cell'], as_index=False).size()
>>> df_grouped
    cell    size
0   A       3
1   B       1
2   D       3

Then, we merge the result with the original DataFrame like so :

>>> df_merged = pd.merge(df, df_grouped, on='cell', how='left')
>>> df_merged
   cell value   kpi     size
0   A   5.0     thpt    3
1   A   6.0     ret     3
2   A   NaN     thpt    3
3   B   NaN     acc     1
4   D   8.0     int     3
5   D   NaN     ps      3
6   D   NaN     yret    3

To finish, we filter the Dataframe to get the expected result :

>>> df_filtered = df_merged[~((df_merged['value'].isna()) & (df_merged['size'] > 1))]
>>> df_filtered[['cell', 'value', 'kpi']]
    cell    value   kpi
0   A       5.0     thpt
1   A       6.0     ret
3   B       NaN     acc
4   D       8.0     int
  • Related