I have a df_final
pandas v1.3.4 dataframe and am exporting it to a CSV file so I don't need to repeat the dataframe building step every time I do an analysis. df_final
will be a 13000 x 91 dataframe, but I am testing the process on a smaller 689x91 dataframe first.
I would like to confirm that the df_final_csv
dataframe generated by reading in the df_final
CSV is the same as the df_final
dataframe. Based on the below, it looks like they are different. However, I'm not sure how. I copied some stack overflow code (below, adapted from here) but some other solutions (eg) dont work as I have list objects in my df_final
. How can I find what value(s) are causing the issue?
If any other information would help please let me know.
#689 rows x 91 columns
df_final = pd.DataFrame.from_dict(results)
print (f'NaN are present: {df_final.isnull().values.any()}')# False
#export to csv
df_final.to_csv('integrated_df.csv')
#read in csv
df_final_csv = pd.read_csv('integrated_df.csv', index_col = 0)
print (f' NaN are present: {df_final_csv .isnull().values.any()}')# False')
print (f'imported df is same as exported df: {df_final.equals(df_final_csv)}')#False
#try and find discrepancies (--> empty df)
different_values = df_final_csv [~df_final_csv .isin(df_final)].dropna() #empty df with only column headers
Cheers!
CodePudding user response:
Maybe there are some special characters which CSV messed up. try to write in .pkl file, you'll get 100% same data.
import pickle
# write into pickle file
pickle.dump(df, open("df.pkl", 'wb'))
# then read it
df_new = pickle.load(open("df.pkl", 'rb'))