I would like to use something similar to the function discussed in this topic: Using the isin() function on grouped data . However with two DataFrames with different lengths and both grouped by variable.
The functions should group column Dev_stage
by year in both DataFrames, compare grouped data and provide data, which are not in one of these grouped DataFrames.
My snippet:
>>> df1
Out:
Dev_stage Year
0 1 1989
1 2 1989
2 2 1989
3 3 1989
4 1 1990
5 1 1990
6 3 1990
>>> df2
Out:
Dev_stage Year
0 1 1989
1 2 1989
2 2 1990
3 1 1990
4 3 1990
I was trying something like this:
out = lambda x, y: x[~x['Dev_stage'].isin(y['Dev_stage'])]
out(df1.groupby('Year'), df2.groupby('Year'))
But also get the error: 'SeriesGroupBy' object has no attribute 'isin'
. I thought that lambda will solve this one.
Expecting something like this:
out:
Dev_stage Year
3 3 1989
Thanks!
CodePudding user response:
IIUC, you can use inner merge to keep the same values among multiple columns, then filter them out
out = df1[~df1.index.isin(df1.reset_index().merge(df2, how='inner')['index'])]
print(out)
Dev_stage Year
3 3 1989
6 3 1990
CodePudding user response:
Use:
In [940]: df1[~df1.Dev_stage.isin(df2.Dev_stage)]
Out[940]:
Dev_stage Year
3 3 1989
6 3 1990