name text group
a|b a test m|l|n
I have a DataFrame like above. If there is a delimiter in a column value, I want to split it and put it in a separate line.
columns = ['name', 'text', 'group']
for column in columns:
if column == 'name' and column in df:
df = df.assign(name=df.name.str.split(delimiter)).explode(column)
The problem with this code is that, I have to use multiple if to test the actual column name string, i.e. 'name'. I want to a general way like below:
if column in df:
df = df.assign(column=df.column.str.split(delimiter)).explode(column)
But this is invalid. Any walk-around to do this?
CodePudding user response:
Use []
instead dot notation:
delimiter = '|'
column = 'group'
if column in df:
df = df.assign(**{column:df[column].str.split(delimiter)}).explode(column)
print (df)
name text group
0 a|b a test m
0 a|b a test l
0 a|b a test n
Another idea if need exploding multiple columns:
#get values from columns list if exist in df.columns
cols = df.columns.intersection(columns)
print (cols)
#assign back splitted columns by dict comprehension and explode by all columns in list cols
df = df.assign(**{x: df[x].str.split(delimiter) for x in cols}).explode(cols)