Home > OS >  Pandas Dataframe Comparison - specify mismatched columns
Pandas Dataframe Comparison - specify mismatched columns

Time:11-22

I have two dataframes as shown below, df1 and df2:

df1 =
            emp_name    emp_city    counts  
    emp_id              
    2       two         city2       3   
    4       fourxxx     city4       1   
    5       five        city5       1   

df2 =
            emp_name    emp_city    counts  
    emp_id              
    2       two         city2       1   
    3       three       city3       1   
    4       four        city4       1   

Note: 'emp_id' acts as index.
I want to find the difference between df1 and df2 and write the name of the columns which has mismatched values. The below code snippet will do that.

df3 = df2.copy()
df3['mismatch_col'] = df2.ne(df1, axis=1).dot(df2.columns)

Results in df3:

df3 = 
        emp_name    emp_city    counts  mismatch_col
emp_id                  
2       two         city2       1       counts
3       three       city3       1       emp_nameemp_citycounts
4       four        city4       1       emp_name

Now the problem I have is with respect to 'mismatch_col'. It is giving me the names of columns where there is a mismatch in df1 and df2. But, the column names are NOT separated. I want to separate the column names by commas. Expected output should look like below:

Expected_df3 = 
            emp_name    emp_city    counts  mismatch_col
    emp_id                  
    2       two         city2       1       counts
    3       three       city3       1       emp_name,emp_city,counts
    4       four        city4       1       emp_name

Can someone please help me on this?

CodePudding user response:

You can use df2.columns ',' to add commas and then str[:-1] to remove the last one:

df3['mismatch_col'] = df2.ne(df1, axis=1).dot(df2.columns   ',').str[:-1]

Result:

        emp_name  emp_city  counts              mismatch_col
emp_id              
2            two     city2       1                    counts
3          three     city3       1  emp_name,emp_city,counts
4           four     city4       1                  emp_name
  • Related