Home > Blockchain >  What causes the difference between these 2 dataframes?
What causes the difference between these 2 dataframes?

Time:05-18

When I create the following dataframe "names"

0       Max
1    Albert
2     Marie
3     Niels
Name: name, dtype: object

and save it to the disk using

names.to_csv("names.csv", index=False)

and load the same csv file using

names_new = pd.load_csv("names.csv")

the names_new looks like this

        name
0     Max
1  Albert
2   Marie
3   Niels

and

names.equals(names_new)

returns false.

When I check they both have the same length. So what is causing the difference and how can I make sure saving a dataframe to csv and loading it again will create the same dataframe?

CodePudding user response:

For compare Series select column name:

print(names.equals(names_new['name']))

If possible not default index in names use:

print(names.reset_index(drop=True).equals(names_new['name']))

CodePudding user response:

the .equals() is to compares two series or dataframes. by checking following;

->Series.eq / DataFrame.eq

Compare two Series/Dataframe objects of the same length and return a Series/Dataframe where each element is True if the element in each Series is equal, False otherwise.*

->testing.assert_series_equal

Raises an AssertionError if left and right are not equal. Provides an easy interface to ignore inequality in dtypes, indexes and precision among others.

->testing.assert_frame_equal

Like assert_series_equal, but targets DataFrames.

And its a good question, that even with data frame type why .equals() function giving False. Furthermore, if you check the Check that left and right DataFrames are equal

by following code.

from pandas.testing import assert_frame_equal
assert_frame_equal(names, names_new)


AssertionError: DataFrame.columns are different

Either you can make column name same or make them series and then compare as below:

names.squeeze().equals(names_new.squeeze())
OUTPUT: True
  • Related