Home > Back-end >  Pandas Column Split but ignore splitting on specific pattern
Pandas Column Split but ignore splitting on specific pattern

Time:03-15

I have a Pandas Series containing Several strings Patterns as below:

stringsToSplit = ['6  Wrap',
                  '1  Salad , 2  Pepsi , 2  Chicken Wrap',
                  '1  Kebab Plate  [1  Bread ]',
                  '1 Beyti Kebab , 1  Chicken Plate  [1  Bread ], 1 Kebab Plate  [1  White Rice ], 1 Tikka Plate  [1  Bread ]',
                  '1 Kebab Plate [1  Bread , 1  Rocca Leaves ], 1  Mountain Dew '
                 ]

s = pd.Series(stringsToSplit)
s

0                                              6  Wrap
1                1  Salad , 2  Pepsi , 2  Chicken Wrap
2                          1  Kebab Plate  [1  Bread ]
3    1 Beyti Kebab , 1  Chicken Plate  [1  Bread ],...
4    1 Kebab Plate [1  Bread , 1  Rocca Leaves ], 1...
dtype: object

I would like to split and explode it such that the result would be as follows:

0    6  Wrap
1    1  Salad
1    2  Pepsi
1    2  Chicken Wrap
2    1  Kebab Plate [1  Bread ]
3    1 Beyti Keba
3    1  Chicken Plate  [1  Bread ]
3    1 Kebab Plate  [1  White Rice ]
3    1  Tikka Plate  [1  Bread ]
4    1 Kebab Plate [1  Bread , 1  Rocca Leaves ]
4    1  Mountain Dew

In order to do the explode I need to first split. However, if I use split(',') that also splits the items between [] which I do not want. I have tried using split using regex but was not able to find the correct pattern.

I would appreciate the support.

CodePudding user response:

You can use a regex with a negative lookahead:

s.str.split(r'\s*,(?![^\[\]]*\])').explode()

output:

0                                        6  Wrap
1                                       1  Salad
1                                       2  Pepsi
1                                2  Chicken Wrap
2                    1  Kebab Plate  [1  Bread ]
3                                  1 Beyti Kebab
3                  1  Chicken Plate  [1  Bread ]
3                1 Kebab Plate  [1  White Rice ]
3                     1 Tikka Plate  [1  Bread ]
4    1 Kebab Plate [1  Bread , 1  Rocca Leaves ]
4                               1  Mountain Dew 
dtype: object

regex demo

  • Related