I have a dataframe with columns values that are np.arrays. For example
df = pd.DataFrame([{"id":1, "sample": np.array([1,2,3])}, {"id":2, "sample": np.array([2,3,4])}])
df.to_csv("./tmp.csv", index=False)
if I save df to csv and load it again I get "sample" column as strings.
df_from_csv = pd.read_csv("./tmp.csv")
df_from_csv == pd.DataFrame([{"id":1, "sample": '[1 2 3]')}, {"id":2, "sample": '[2 3 4]')}])
True
Is there a better way to save/load my data that does no requiere manually passing '[1 2 3]' to ist corresponding array?
CodePudding user response:
You can use a converter in read_csv
:
import numpy as np
from ast import literal_eval
import re
def to_array(x):
return np.array(literal_eval(re.sub('\s ', ',', x)))
df_from_csv = pd.read_csv("./tmp.csv", converters={'sample': to_array})
# id sample
# 0 1 [1, 2, 3]
# 1 2 [2, 3, 4]
df_from_csv.loc[0, 'sample']
# array([1, 2, 3])