I have 2 dataframes:
df1 = pd.DataFrame({"id1": ["A", "B", "C", "D"], "id2": ["1", "2", "2", "1"], "id3": ["33", "232", "343", "555"]})
df2 = pd.DataFrame({"id1": ["A", "B", "F", "C", "D", "E"], "id2": ["1", "2", "2", "1", "1", "2"], "id3": ["33", "11", "77", "99", "555","88"]})
I would like to get an output which tells me which rows of df2
have been modified or not (Y for yes and N for No) such as the following:
id1 | Modified_ID | |
---|---|---|
0 | A | N |
1 | B | Y |
2 | F | N |
3 | C | Y |
4 | D | N |
5 | E | N |
CodePudding user response:
You can merge
and build a boolean mask that returns True if a value didn't change (or was expanded) and map
Y/N values according to it:
df2 = df2.merge(df1, on='id1', how='left', suffixes=('_',''))
df2['Modified_ID'] = (df2['id2_'].eq(df2['id2']) | df2['id2'].isna()).map({True:'N', False:'Y'})
df2 = df2.drop(columns=['id2_','id2'])
Output:
id1 Modified_ID
0 A N
1 B Y
2 F N
3 C Y
4 D N
5 E N
For more than 1 column, use:
df2 = df2.merge(df1, on='id1', how='left', suffixes=('_',''))
cols = ['id2','id3']
df2['Modified_ID'] = (df2[cols].eq(df2[[f'{c}_' for c in cols]].to_numpy()).all(axis=1) | df2[cols].isna().all(axis=1)).map({True:'N', False:'Y'})
df2 = df2.drop(columns=['id2_','id2','id3_','id3'])
Output:
id1 Modified_ID
0 A N
1 B Y
2 F N
3 C Y
4 D N
5 E N