When I create the following dataframe "names"
0 Max
1 Albert
2 Marie
3 Niels
Name: name, dtype: object
and save it to the disk using
names.to_csv("names.csv", index=False)
and load the same csv file using
names_new = pd.load_csv("names.csv")
the names_new looks like this
name
0 Max
1 Albert
2 Marie
3 Niels
and
names.equals(names_new)
returns false.
When I check they both have the same length. So what is causing the difference and how can I make sure saving a dataframe to csv and loading it again will create the same dataframe?
CodePudding user response:
For compare Series select column name
:
print(names.equals(names_new['name']))
If possible not default index in names
use:
print(names.reset_index(drop=True).equals(names_new['name']))
CodePudding user response:
the .equals() is to compares two series or dataframes. by checking following;
->Series.eq / DataFrame.eq
Compare two Series/Dataframe objects of the same length and return a Series/Dataframe where each element is True if the element in each Series is equal, False otherwise.*
->testing.assert_series_equal
Raises an AssertionError if left and right are not equal. Provides an easy interface to ignore inequality in dtypes, indexes and precision among others.
->testing.assert_frame_equal
Like assert_series_equal, but targets DataFrames.
And its a good question, that even with data frame type why .equals() function giving False. Furthermore, if you check the Check that left and right DataFrames are equal
by following code.
from pandas.testing import assert_frame_equal
assert_frame_equal(names, names_new)
AssertionError: DataFrame.columns are different
Either you can make column name same or make them series and then compare as below:
names.squeeze().equals(names_new.squeeze())
OUTPUT: True