Currently two dataframes that must be the same, but are of different sizes. How do I compare the two Dataframes, to find the data that are different using pandas?
I couldn't use df_control.eq()
import pandas as pd
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df2 = pd.DataFrame({'col1': [14, 22], 'col2': [32, 22]})
df.eq(df2)
Compare two series
Output:
col1 col2
0 False False
1 False True
I practically want to compare the two dataframes with a large amount of data and filter the rows that are different, for data validation
Expected
col1 col2 Verify
1 3 False
2 4 False
14 32 False
22 22 True
CodePudding user response:
result = pd.concat([df, df2])
ar = result.to_numpy()
result['Verify'] = (ar[:, [0]] == ar).all(axis=1)
Result:
col1 col2 Verify
0 1 3 False
1 2 4 False
0 14 32 False
1 22 22 True