I have two dataframes as shown below, df1
and df2
:
df1 =
emp_name emp_city counts
emp_id
2 two city2 3
4 fourxxx city4 1
5 five city5 1
df2 =
emp_name emp_city counts
emp_id
2 two city2 1
3 three city3 1
4 four city4 1
Note: 'emp_id' acts as index.
I want to find the difference between df1
and df2
and write the name of the columns which has mismatched values. The below code snippet will do that.
df3 = df2.copy()
df3['mismatch_col'] = df2.ne(df1, axis=1).dot(df2.columns)
Results in df3
:
df3 =
emp_name emp_city counts mismatch_col
emp_id
2 two city2 1 counts
3 three city3 1 emp_nameemp_citycounts
4 four city4 1 emp_name
Now the problem I have is with respect to 'mismatch_col'. It is giving me the names of columns where there is a mismatch in df1
and df2
. But, the column names are NOT separated. I want to separate the column names by commas. Expected output should look like below:
Expected_df3 =
emp_name emp_city counts mismatch_col
emp_id
2 two city2 1 counts
3 three city3 1 emp_name,emp_city,counts
4 four city4 1 emp_name
Can someone please help me on this?
CodePudding user response:
You can use df2.columns ','
to add commas and then str[:-1]
to remove the last one:
df3['mismatch_col'] = df2.ne(df1, axis=1).dot(df2.columns ',').str[:-1]
Result:
emp_name emp_city counts mismatch_col
emp_id
2 two city2 1 counts
3 three city3 1 emp_name,emp_city,counts
4 four city4 1 emp_name