I am trying to save & read a data frame that has following formatting
index sentence
0 ['aa', 'bb', 'cc']
1 ['dd', 'ee', 'ff']
When I read the saved csv file and turn the 'sentence' column into a list using tolist(), the created list treats
"['aa', 'bb', 'cc']" as a string (including the brackets and ')
Is there a way to read the column as list of lists of strings? [['aa', 'bb', 'cc'], ['dd', 'ee', 'ff']] ...
Or recommended formatting when saving the sentence column in the first place?
CodePudding user response:
Your problem lies with the saving method. CSVs are not natively able to store lists unless you specifically parse them after reading.
Would it be possible for you to save time and effort by saving in another format instead? JSON natively supprots lists and is also a format that can be easily read by humans.
Here is an obligatory snippet for you:
import pandas as pd
df = pd.DataFrame([{"sentence":['aa', 'bb', 'cc']},{"sentence":['dd', 'ee', 'ff']}])
df.to_json("myfile.json")
df2 = pd.read_json("myfile.json")
Giving the following result:
>>> df2
sentence
0 [aa, bb, cc]
1 [dd, ee, ff]