Home > Enterprise >  Split text expanding rows in Pandas
Split text expanding rows in Pandas

Time:07-25

I have this dataset:

mydf = pd.DataFrame({'source':['a','b','a','b'],
                     'text':['November rain','Sweet child omine','Paradise City','Patience']})
mydf

    source  text
0   a       November rain
1   b       Sweet child omine
2   a       Paradise City
3   b       Patience

And I want to split the text inside column text. This is the expected result:

    source  text
0   a       November 
1   a       rain
2   b       Sweet 
3   b       child 
4   b       omine
5   a       Paradise 
6   a       City
7   b       Patience

This is what I have tried:

mydf['text'] = mydf['text'].str.split(expand=True)

But it returns me an error:

ValueError: Columns must be same length as key

What I am doing wrong? Is there a way to do this without creating an index?

CodePudding user response:

str.split(expand=True) returns a dataframe, normally with more than one column, so you can't assign back to your original column:

# output of `str.split(expand=True)`
          0      1      2
0  November   rain   None
1     Sweet  child  omine
2  Paradise   City   None
3  Patience   None   None

I think you mean:

# expand=False is default
mydf['text'] = mydf['text'].str.split()
mydf = mydf.explode('text')

You can also chain with assign:

mydf.assign(text=mydf['text'].str.split()).explode('text')

Output:

  source      text
0      a  November
0      a      rain
1      b     Sweet
1      b     child
1      b     omine
2      a  Paradise
2      a      City
3      b  Patience
  • Related