Home > Back-end >  find different values in two dataframes exported to and imported from the same CSV
find different values in two dataframes exported to and imported from the same CSV

Time:11-07

I have a df_final pandas v1.3.4 dataframe and am exporting it to a CSV file so I don't need to repeat the dataframe building step every time I do an analysis. df_final will be a 13000 x 91 dataframe, but I am testing the process on a smaller 689x91 dataframe first.

I would like to confirm that the df_final_csv dataframe generated by reading in the df_final CSV is the same as the df_final dataframe. Based on the below, it looks like they are different. However, I'm not sure how. I copied some stack overflow code (below, adapted from here) but some other solutions (eg) dont work as I have list objects in my df_final. How can I find what value(s) are causing the issue?

If any other information would help please let me know.

#689 rows x 91 columns
df_final = pd.DataFrame.from_dict(results)                                
print (f'NaN are present:  {df_final.isnull().values.any()}')# False

#export to csv
df_final.to_csv('integrated_df.csv')

#read in csv
df_final_csv = pd.read_csv('integrated_df.csv', index_col = 0)
print (f' NaN are present:  {df_final_csv .isnull().values.any()}')# False')
print (f'imported df is same as exported df:  {df_final.equals(df_final_csv)}')#False 

#try and find discrepancies (--> empty df)     
different_values = df_final_csv [~df_final_csv .isin(df_final)].dropna() #empty df with only column headers 

Cheers!

CodePudding user response:

Maybe there are some special characters which CSV messed up. try to write in .pkl file, you'll get 100% same data.

import pickle
# write into pickle file
pickle.dump(df, open("df.pkl", 'wb'))

# then read it
df_new = pickle.load(open("df.pkl", 'rb'))
  • Related