How to compare 2 dataframes and then output an ID to inform if a row has changed?-CodePudding

I have 2 dataframes:

df1 = pd.DataFrame({"id1": ["A", "B", "C", "D"], "id2": ["1", "2", "2", "1"], "id3": ["33", "232", "343", "555"]})

df2 = pd.DataFrame({"id1": ["A", "B", "F", "C", "D", "E"], "id2": ["1", "2", "2", "1", "1", "2"], "id3": ["33", "11", "77", "99", "555","88"]})

I would like to get an output which tells me which rows of df2 have been modified or not (Y for yes and N for No) such as the following:

	id1	Modified_ID
0	A	N
1	B	Y
2	F	N
3	C	Y
4	D	N
5	E	N

CodePudding user response：

You can merge and build a boolean mask that returns True if a value didn't change (or was expanded) and map Y/N values according to it:

df2 = df2.merge(df1, on='id1', how='left', suffixes=('_',''))
df2['Modified_ID'] = (df2['id2_'].eq(df2['id2']) | df2['id2'].isna()).map({True:'N', False:'Y'})
df2 = df2.drop(columns=['id2_','id2'])

Output:

  id1 Modified_ID
0   A           N
1   B           Y
2   F           N
3   C           Y
4   D           N
5   E           N

For more than 1 column, use:

df2 = df2.merge(df1, on='id1', how='left', suffixes=('_',''))
cols = ['id2','id3']
df2['Modified_ID'] = (df2[cols].eq(df2[[f'{c}_' for c in cols]].to_numpy()).all(axis=1) | df2[cols].isna().all(axis=1)).map({True:'N', False:'Y'})
df2 = df2.drop(columns=['id2_','id2','id3_','id3'])

Output:

  id1 Modified_ID
0   A           N
1   B           Y
2   F           N
3   C           Y
4   D           N
5   E           N