Home > Net >  Splitting a pandas column on multiple delimiters and then converting them into rows
Splitting a pandas column on multiple delimiters and then converting them into rows

Time:03-31

Currently my dataset is as:

ID value
1 str1/value1,str2/value21,value22,value23,str3/value31,value32
2 str4/value4,str5/value51,value52,value53,str6/value61,value62

I want to first split data on a comma (',') and then on the ('/'), but I want to keep all the occurrences after '/' with a comma. Also, I need to split them into row values.

The output should be:

ID str value
1 str1 value1
1 str2 value21,value22,value23
1 str3 value31,value32
2 str4 value4
2 str5 value51,value52,value53
2 str6 value61,value62

Please can somebody help me do it in an efficient way, instead of looping it for each and every occurrence.

CodePudding user response:

IIUC, you could use a double split with a regex:

df['str'] = df['value'].str.split(r',(?=[^/,] /)')
df = df.explode('str')
df[['str', 'value']] = df['str'].str.split('/', expand=True)

output:

   ID                    value   str
0   1                   value1  str1
0   1  value21,value22,value23  str2
0   1          value31,value32  str3
1   2                   value4  str4
1   2  value51,value52,value53  str5
1   2          value61,value62  str6

Or, generating a new dataframe, with extractall and named capturing groups a join:

df2 = (df[['ID']]
       .join(df['value']
             .str.extractall(r'(?P<str>[^,/] )/(?P<value>[^/] ?)(?=,[^,] /|$)')
             .droplevel(1))
      )

output:

   ID   str                    value
0   1  str1                   value1
0   1  str2  value21,value22,value23
0   1  str3          value31,value32
1   2  str4                   value4
1   2  str5  value51,value52,value53
1   2  str6          value61,value62

enter image description here

  • Related