I have this dataset:
mydf = pd.DataFrame({'source':['a','b','a','b'],
'text':['November rain','Sweet child omine','Paradise City','Patience']})
mydf
source text
0 a November rain
1 b Sweet child omine
2 a Paradise City
3 b Patience
And I want to split the text inside column text
. This is the expected result:
source text
0 a November
1 a rain
2 b Sweet
3 b child
4 b omine
5 a Paradise
6 a City
7 b Patience
This is what I have tried:
mydf['text'] = mydf['text'].str.split(expand=True)
But it returns me an error:
ValueError: Columns must be same length as key
What I am doing wrong? Is there a way to do this without creating an index?
CodePudding user response:
str.split(expand=True)
returns a dataframe, normally with more than one column, so you can't assign back to your original column:
# output of `str.split(expand=True)`
0 1 2
0 November rain None
1 Sweet child omine
2 Paradise City None
3 Patience None None
I think you mean:
# expand=False is default
mydf['text'] = mydf['text'].str.split()
mydf = mydf.explode('text')
You can also chain with assign
:
mydf.assign(text=mydf['text'].str.split()).explode('text')
Output:
source text
0 a November
0 a rain
1 b Sweet
1 b child
1 b omine
2 a Paradise
2 a City
3 b Patience