Home > Software design >  DataFrame.equals() returns False when comparing data frames with the same content but initialized di
DataFrame.equals() returns False when comparing data frames with the same content but initialized di

Time:08-24

The following code is supposedly creating two identical data frames, but the test for equality returns False:

import pandas as pd

df1 = pd.DataFrame(columns=["A"])
df2 = pd.DataFrame({"A": []})
print(df1)
print(df2)
print(df1.equals(df2))

Here is the output produced by the code above:

Command Line Arguments
   
Empty DataFrame
Columns: [A]
Index: []
Empty DataFrame
Columns: [A]
Index: []
False

Why does df1.equals(df2) return False?

CodePudding user response:

There is a method for testing equality with more detail:

import pandas as pd
from pandas.testing import assert_frame_equal

df1 = pd.DataFrame(columns=["A"])
df2 = pd.DataFrame({"A": []})

assert_frame_equal(df1,df2)

Output

DataFrame.index classes are not equivalent
[left]:  Index([], dtype='object')
[right]: RangeIndex(start=0, stop=0, step=1)

Then

assert_frame_equal(df1.reset_index(drop=True),df2.reset_index(drop=True))

Output

Attribute "dtype" are different
[left]:  object
[right]: float64

Finally, this will get you there

df1.reset_index(drop=True).equals(df2.astype(object).reset_index(drop=True))
  • Related