Say I have a pandas data frame called df_all
.
I then wish to group by columns Foo
, and select only the last row for each grouped by set:
# E.g.
# Foo Bar Baz
# 1 1 1
# 1 2 2
# 2 1 2
# 2 3 4
# 2 5 6
# Wish to select rows '1 2 3' and '2 5 6' since if we group by Foo,
# they are the last fow for each distinct Foo value
df_slice = df_all.groupby('Foo').last()
The above works, now I wish to have the set of rows that are in df_all, and not in df_slice
, this is what I tried:
dv_inverse = df[~df_slice.isin(df_all)].dropna(how = 'all')
CodePudding user response:
What about:
df_inverse = df_all[df_all.duplicated(subset='Foo', keep='last')]
print(df_inverse)
Foo Bar Baz
0 1 1 1
2 2 1 2
3 2 3 4
CodePudding user response:
Another option is to use groupby.tail
to construct df_slice
; then use index.isin
to filter df_all
for dv_inverse
:
df_slice = df_all.groupby('Foo').tail(1)
dv_inverse = df_all[~df_all.index.isin(df_slice.index)]
Output:
Foo Bar Baz
0 1 1 1
2 2 1 2
3 2 3 4
CodePudding user response:
Use Series.duplicated
with inverted mask for df_slice
by boolean indexing
:
m = df_all['Foo'].duplicated(keep='last')
dv_inverse = df_all[m]
print (dv_inverse)
Foo Bar Baz
0 1 1 1
2 2 1 2
3 2 3 4
df_slice = df_all[~m]
print (df_slice)
Foo Bar Baz
1 1 2 2
4 2 5 6