Home > OS >  Python Pandas: Select inverse of dataframe grouby
Python Pandas: Select inverse of dataframe grouby

Time:05-11

Say I have a pandas data frame called df_all. I then wish to group by columns Foo, and select only the last row for each grouped by set:

# E.g. 
# Foo Bar Baz
#  1   1   1
#  1   2   2
#  2   1   2
#  2   3   4
#  2   5   6
# Wish to select rows '1 2 3' and '2 5 6' since if we group by Foo,
# they are the last fow for each distinct Foo value
df_slice = df_all.groupby('Foo').last()

The above works, now I wish to have the set of rows that are in df_all, and not in df_slice, this is what I tried:

dv_inverse = df[~df_slice.isin(df_all)].dropna(how = 'all')

CodePudding user response:

What about:

df_inverse = df_all[df_all.duplicated(subset='Foo', keep='last')] 
print(df_inverse)
   Foo  Bar  Baz
0    1    1    1
2    2    1    2
3    2    3    4

CodePudding user response:

Another option is to use groupby.tail to construct df_slice; then use index.isin to filter df_all for dv_inverse:

df_slice = df_all.groupby('Foo').tail(1)
dv_inverse = df_all[~df_all.index.isin(df_slice.index)]

Output:

   Foo  Bar  Baz
0    1    1    1
2    2    1    2
3    2    3    4

CodePudding user response:

Use Series.duplicated with inverted mask for df_slice by boolean indexing:

m = df_all['Foo'].duplicated(keep='last')
dv_inverse = df_all[m]
print (dv_inverse)
   Foo  Bar  Baz
0    1    1    1
2    2    1    2
3    2    3    4

df_slice = df_all[~m]
print (df_slice)
   Foo  Bar  Baz
1    1    2    2
4    2    5    6
  • Related