How to split string at any number followed by a period instead of a fixed delimiter-CodePudding

input:

string="1.Adam-Lee-Dotnet-9191919191-AdamLee@gmail.com-London-UK-Oracle-Banking2.Peter-Smith-Salesforce-9222291910-PeterSmith21@gmail.com-Mumbai-INDIA-Oracle-Engineering3.Harrison-Lu-Java-9223391910-HarrisonLu@gmail.com-Mumbai-INDIA-Samsung-Engineering4.Andrew-Joseph-Javascript-9200091910-AndrewJoseph@gmail.com-Toronto-CANADA-Dell-Engineering5.Larry-Ken-SQL-8880091910-LarryKen@gmail.com-Newyork-USA-HP-Management"

expected output:

[
    "1.Adam-Lee-Dotnet-9191919191-AdamLee@gmail.com-London-UK-Oracle-Banking", 
    "2.Peter-Smith-Salesforce-9222291910-PeterSmith21@gmail.com-Mumbai-INDIA-Oracle-Engineering",
    ...
]

Attempt: I have tried using a string.split(range(0,5) "."). What would be the best way to do this?

CodePudding user response：

I don't usually reach for regular expressions first, but this cries out for re.split.

parts = re.split(r'(\d\.)`, string)

This does need a bit of post-processing. It creates:

['', '1.', 'Adam-Lee-Dotnet-9191919191-AdamLee@gmail.com-London-UK-Oracle-Banking', '2.', 'Peter-Smith-Salesforce-9222291910-PeterSmith21@gmail.com-Mumbai-INDIA-Oracle-Engineering', ...

So you'll need to combine ever other element.

CodePudding user response：

You could split using a regex with lookaround assertions that assert 1 digits followed by a dot to the right using (?=\d \.) and assert not the start of the string to the left using (?<!^)

(?<!^)(?=\d \.)

Regex demo | Python demo

import re

pattern = r"(?<!^)(?=\d \.)"
string="1.Adam-Lee-Dotnet-9191919191-AdamLee@gmail.com-London-UK-Oracle-Banking2.Peter-Smith-Salesforce-9222291910-PeterSmith21@gmail.com-Mumbai-INDIA-Oracle-Engineering3.Harrison-Lu-Java-9223391910-HarrisonLu@gmail.com-Mumbai-INDIA-Samsung-Engineering4.Andrew-Joseph-Javascript-9200091910-AndrewJoseph@gmail.com-Toronto-CANADA-Dell-Engineering5.Larry-Ken-SQL-8880091910-LarryKen@gmail.com-Newyork-USA-HP-Management"                    

res = re.split(pattern, string)
print(res)

Output

[
 '1.Adam-Lee-Dotnet-9191919191-AdamLee@gmail.com-London-UK-Oracle-Banking',
 '2.Peter-Smith-Salesforce-9222291910-PeterSmith21@gmail.com-Mumbai-INDIA-Oracle-Engineering',
 '3.Harrison-Lu-Java-9223391910-HarrisonLu@gmail.com-Mumbai-INDIA-Samsung-Engineering',
 '4.Andrew-Joseph-Javascript-9200091910-AndrewJoseph@gmail.com-Toronto-CANADA-Dell-Engineering',
 '5.Larry-Ken-SQL-8880091910-LarryKen@gmail.com-Newyork-USA-HP-Management'
]

Or instead of splitting, you could also use a pattern to match 1 or more digits followed by a dot, and then match until the first occurrence of the same pattern or the end of the string.

\d \..*?(?=\d \.|$)

Regex demo | Python demo

import re
 
pattern = r"\d \..*?(?=\d \.|$)"
string="1.Adam-Lee-Dotnet-9191919191-AdamLee@gmail.com-London-UK-Oracle-Banking2.Peter-Smith-Salesforce-9222291910-PeterSmith21@gmail.com-Mumbai-INDIA-Oracle-Engineering3.Harrison-Lu-Java-9223391910-HarrisonLu@gmail.com-Mumbai-INDIA-Samsung-Engineering4.Andrew-Joseph-Javascript-9200091910-AndrewJoseph@gmail.com-Toronto-CANADA-Dell-Engineering5.Larry-Ken-SQL-8880091910-LarryKen@gmail.com-Newyork-USA-HP-Management"                    
 
res = re.findall(pattern, string)