Home > OS >  combine and group rows from 2 dfs
combine and group rows from 2 dfs

Time:03-28

I have 2 dfs, which I want to combine as the following:

df1 = pd.DataFrame({"a": [1,2], "b":['A','B'], "c":[3,2]})


df2 = pd.DataFrame({"a": [1,1,1, 2,2,2, 3, 4], "b":['A','A','A','B','B', 'B','C','D'], "c":[3, None,None,2,None,None,None,None]})

Output: 
    a   b   c
   1   A   3.0
   1   A   NaN
   1   A   NaN
   2   B   2.0
   2   B   NaN
   2   B   NaN

I had an earlier version of this question that only involved df2 and was solved with

df.groupby(['a','b']).filter(lambda g: any(~g['c'].isna())) 

but now I need to run it only for rows that appear in df1 (df2 contains rows from df1 but some extra rows which I want to not be included.

Thanks!

CodePudding user response:

IIUC, you could merge:

out = df2.merge(df1[['a','b']])

or you could use chained isin:

out1 = df2[df2['a'].isin(df1['a']) & df2['b'].isin(df1['b'])]

Output:

   a  b    c
0  1  A  3.0
1  1  A  NaN
2  1  A  NaN
3  2  B  2.0
4  2  B  NaN
5  2  B  NaN

CodePudding user response:

You can turn the indicator on with merge

out = df2.merge(df1,indicator=True,how='outer',on=['a','b'])
Out[91]: 
   a  b  c_x  c_y     _merge
0  1  A  3.0  3.0       both
1  1  A  NaN  3.0       both
2  1  A  NaN  3.0       both
3  2  B  2.0  2.0       both
4  2  B  NaN  2.0       both
5  2  B  NaN  2.0       both
6  3  C  NaN  NaN  left_only
7  4  D  NaN  NaN  left_only

out = out[out['_merge']=='both']
  • Related