Using the isin() function on grouped data from two dataframes-CodePudding

I would like to use something similar to the function discussed in this topic: Using the isin() function on grouped data . However with two DataFrames with different lengths and both grouped by variable.

The functions should group column Dev_stage by year in both DataFrames, compare grouped data and provide data, which are not in one of these grouped DataFrames.

My snippet:

>>> df1
Out:
    Dev_stage Year
0   1         1989
1   2         1989
2   2         1989
3   3         1989
4   1         1990
5   1         1990
6   3         1990

>>> df2
Out:
    Dev_stage Year
0   1         1989
1   2         1989
2   2         1990
3   1         1990
4   3         1990

I was trying something like this:

out = lambda x, y: x[~x['Dev_stage'].isin(y['Dev_stage'])]
out(df1.groupby('Year'), df2.groupby('Year'))

But also get the error: 'SeriesGroupBy' object has no attribute 'isin'. I thought that lambda will solve this one.

Expecting something like this:

out:   
    Dev_stage Year
3   3         1989

Thanks!

CodePudding user response：

IIUC, you can use inner merge to keep the same values among multiple columns, then filter them out

out = df1[~df1.index.isin(df1.reset_index().merge(df2, how='inner')['index'])]

print(out)

   Dev_stage  Year
3          3  1989
6          3  1990

CodePudding user response：

Use:

In [940]: df1[~df1.Dev_stage.isin(df2.Dev_stage)]
Out[940]: 
   Dev_stage  Year
3          3  1989
6          3  1990