I have a pandas dataframe with one column of sentences in a list, a sample is below:
import pandas as pd
d ={1: "['f, they have everything i am looking for.', 'd has a lot of diffrent options, and they carry every size needed.', 'q, i always find what I am looking for']",
2: "['easy to navigate', 'fast and easy to use. very helpful when needed, would recommend! will definitely use in future', 'easy to use, very convenient']"
}
s = pd.Series(d)
What I would like to do is split each of the sentences in the list, there are three sentences per list, into individual columns like below
d2 = [['f, they have everything i am looking for.', 'd has a lot of different options, and they carry every size needed.', 'q, i always find what I am looking for'], ['easy to navigate', 'fast and easy to use. very helpful when needed, would recommend! will definitely use in future', 'easy to use, very convenient']]
df = pd.DataFrame(d2, columns=['rep1', 'rep2', 'rep3'])
df
My attempts at using Series.str.split()
have been unsuccessful.
CodePudding user response:
You could use split etc. but it’s not very robust:
>>> s.str[2:-2].str.split("',\s*'", expand=True).add_prefix('rep')
rep0 ... rep2
1 f, they have everything i am looking for. ... q, i always find what I am looking for
2 easy to navigate ... easy to use, very convenient
[2 rows x 3 columns]
The robust way to do it is ast.literal_eval
, and then some pivoting:
>>> df = s.apply(ast.literal_eval).explode().rename('val').reset_index()
>>> df = df.join(df.groupby('index').cumcount().rename('rep').add(1)).pivot('index', 'rep', 'val').add_prefix('rep')
>>> df
rep rep1 ... rep3
index ...
1 f, they have everything i am looking for. ... q, i always find what I am looking for
2 easy to navigate ... easy to use, very convenient
[2 rows x 3 columns]