Home > OS >  Remove the self/other row in a comparison DataFrame
Remove the self/other row in a comparison DataFrame

Time:11-02

I have a very simple script I am running (basically a test for a script with a much larger dataset)

import pandas as pd

Data1 = {'First Name': ["Chris" , "John", "Jane"], 
        'Last Name': ["Potter","Doe", "Doe"],
        'Age': ["23", "32", "31"]}

Data2 = {'First Name': ["George" , "John", "Jane"], 
        'Last Name': ["Hall","Doe", "Doe"],
        'Age': ["27", "32", "31"]}

df1 = pd.DataFrame(Data1)
df2 = pd.DataFrame(Data2)

Comparison = df1.compare(df2, keep_shape=True, keep_equal=True)

print(df1)
print(df2)
print(Comparison)

This produces a Comparison data frame that looks like below:

  First Name         Last Name        Age      
        self   other      self other self other
0      Chris  George    Potter  Hall   23    27
1       John    John       Doe   Doe   32    32
2       Jane    Jane       Doe   Doe   31    31

My question is if there is a way to remove/manipulate the self/other row? I couldn't find anything on google either

CodePudding user response:

If you want to remove the second level of your column index, use droplevel:

>>> df1.compare(df2, keep_shape=True, keep_equal=True).droplevel(1, axis=1)
  First Name First Name Last Name Last Name Age Age
0      Chris     George    Potter      Hall  23  27
1       John       John       Doe       Doe  32  32
2       Jane       Jane       Doe       Doe  31  31

CodePudding user response:

  1. You can use reset_index:

Due to reset_index just works with indexes, you have to transpose, perform the reset_index and then transpose again:

Comparison = Comparison.T.reset_index(drop=True).T

This will reset your columns names. You will have to set the names again after the command.

Output:

       0       1       2     3   4   5
0  Chris  George  Potter  Hall  23  27
1   John    John     Doe   Doe  32  32
2   Jane    Jane     Doe   Doe  31  31
  1. Other option is just rename the Comparison columns:

Simply set your columns names again after the df1.compare, with something like this:

Comparison.columns = [el[0]   "_"   el[1] for el in Comparison.columns.values]

Output:

  First Name_self First Name_other  ... Age_self Age_other
0           Chris           George  ...       23        27
1            John             John  ...       32        32
2            Jane             Jane  ...       31        31

CodePudding user response:

Hi I think they are immutable One of the ways you can change the debt is this way Data1 ['First Name'] [0] = 'Kronivar' Then in the next line this printer changes

Use drop to delete

  • Related