Home > Enterprise >  How to split one row into multiple rows in a generic way in pandas?
How to split one row into multiple rows in a generic way in pandas?

Time:10-18

name           text       group
a|b            a test     m|l|n

I have a DataFrame like above. If there is a delimiter in a column value, I want to split it and put it in a separate line.

columns = ['name', 'text', 'group']            
for column in columns:
   if column == 'name' and column in df:
      df = df.assign(name=df.name.str.split(delimiter)).explode(column)

The problem with this code is that, I have to use multiple if to test the actual column name string, i.e. 'name'. I want to a general way like below:

if column in df:
   df = df.assign(column=df.column.str.split(delimiter)).explode(column)

But this is invalid. Any walk-around to do this?

CodePudding user response:

Use [] instead dot notation:

delimiter = '|'
column = 'group'

if column in df:
    df = df.assign(**{column:df[column].str.split(delimiter)}).explode(column)
    print (df)
  name    text group
0  a|b  a test     m
0  a|b  a test     l
0  a|b  a test     n

Another idea if need exploding multiple columns:

#get values from columns list if exist in df.columns
cols = df.columns.intersection(columns)         
print (cols)

#assign back splitted columns by dict comprehension and explode by all columns in list cols
df = df.assign(**{x: df[x].str.split(delimiter) for x in cols}).explode(cols)
  • Related