Splitting a pandas column on multiple delimiters and then converting them into rows-CodePudding

Currently my dataset is as:

ID	value
1	str1/value1,str2/value21,value22,value23,str3/value31,value32
2	str4/value4,str5/value51,value52,value53,str6/value61,value62

I want to first split data on a comma (',') and then on the ('/'), but I want to keep all the occurrences after '/' with a comma. Also, I need to split them into row values.

The output should be:

ID	str	value
1	str1	value1
1	str2	value21,value22,value23
1	str3	value31,value32
2	str4	value4
2	str5	value51,value52,value53
2	str6	value61,value62

Please can somebody help me do it in an efficient way, instead of looping it for each and every occurrence.

CodePudding user response：

IIUC, you could use a double split with a regex:

df['str'] = df['value'].str.split(r',(?=[^/,] /)')
df = df.explode('str')
df[['str', 'value']] = df['str'].str.split('/', expand=True)

output:

   ID                    value   str
0   1                   value1  str1
0   1  value21,value22,value23  str2
0   1          value31,value32  str3
1   2                   value4  str4
1   2  value51,value52,value53  str5
1   2          value61,value62  str6

Or, generating a new dataframe, with extractall and named capturing groups a join:

df2 = (df[['ID']]
       .join(df['value']
             .str.extractall(r'(?P<str>[^,/] )/(?P<value>[^/] ?)(?=,[^,] /|$)')
             .droplevel(1))
      )

output:

   ID   str                    value
0   1  str1                   value1
0   1  str2  value21,value22,value23
0   1  str3          value31,value32
1   2  str4                   value4
1   2  str5  value51,value52,value53
1   2  str6          value61,value62