Home > Blockchain >  Separating lower case and uppercase with a comma in Pandas Series
Separating lower case and uppercase with a comma in Pandas Series

Time:04-10

I have a pandas series

list_df = pd.Series(['KingsDuck',
       'RangersIslandersDevils',
       'Shark',
       'Maple Leafs',
       'Red Wing'])

display(list_df)
0                 KingsDuck
1    RangersIslandersDevils
2                     Shark
3               Maple Leafs
4                  Red Wing
dtype: object

and I would like to insert a comma between lower character and upper character. (Eg: 'KingsDuck' to 'Kings,Duck' and 'RangersIslandersDevils' to 'Rangers,Islanders,Devils')

I tried an online python regex tools to test my regex and it worked as intended: regextesting

However when I tried the regex in my Jupyter Notebook, the output is not what I expected:

list_df.replace(r'(([a-z])([A-Z]))',r'\1,\2', regex=True)
0                   KingsD,suck
1    RangersI,sslandersD,sevils
2                         Shark
3                   Maple Leafs
4                      Red Wing
dtype: object

How do I go about this?

CodePudding user response:

You have too many groups, remove the external parentheses. You have ((a)(b)) so \1 is ab, \2 is a, \3 is b.

list_df.replace(r'([a-z])([A-Z])',
                r'\1,\2', regex=True)

Or if you really want to keep the external group:

list_df.replace(r'(([a-z])([A-Z]))',
                r'\2,\3', regex=True)

Output:

0                  Kings,Duck
1    Rangers,Islanders,Devils
2                       Shark
3                 Maple Leafs
4                    Red Wing
dtype: object
  • Related