How to iterate over a dataframe parsed with spaCy after it was saved as a csv?-CodePudding

I created a data frame with spacy (columns: sentencens, tokens, stopwords, content words, pos, entities) and saved it as a csv. When I read it as a csv file (it looks pretty decent), but when I perform a for loop over the columns, it does not return the expected result (for my basic domain of Python).

For example:

words = []
for items in df['tokens']:
    for word in items:
        words.append(word)

what I expected [tea, and, ...]

what I got ['t', 'e', 'a', ',', ' ', 'a', 'n', 'd', . . .

It happens in any column I try to iterate over. What is wrong with it? I need it to be a csv as it is, to be shared with college mates that uses excel to visualize the data.

CodePudding user response：

from ast import literal_eval
df = pd.read_csv('sample.csv', converters={'tokens': literal_eval})

print(df.iloc[0,0][0])

This should help you evaluate the list of strings and then parse it out separately as needed. Then your for loop will work as above

CodePudding user response：

I didn't had the minimum reproducible example. I created one:

import pandas as pd
df = pd.DataFrame(
     { 'tokens': [['tea','and','coffee'],['coffee','with','milk']]})

words = []
for items in df['tokens']:
    for word in items:
        words.append(word)

Please show us your df['tokens'] and give us a minimum reproducible example, I will edit accordingly.