input:
string="1.Adam-Lee-Dotnet-9191919191-AdamLee@gmail.com-London-UK-Oracle-Banking2.Peter-Smith-Salesforce-9222291910-PeterSmith21@gmail.com-Mumbai-INDIA-Oracle-Engineering3.Harrison-Lu-Java-9223391910-HarrisonLu@gmail.com-Mumbai-INDIA-Samsung-Engineering4.Andrew-Joseph-Javascript-9200091910-AndrewJoseph@gmail.com-Toronto-CANADA-Dell-Engineering5.Larry-Ken-SQL-8880091910-LarryKen@gmail.com-Newyork-USA-HP-Management"
expected output:
[
"1.Adam-Lee-Dotnet-9191919191-AdamLee@gmail.com-London-UK-Oracle-Banking",
"2.Peter-Smith-Salesforce-9222291910-PeterSmith21@gmail.com-Mumbai-INDIA-Oracle-Engineering",
...
]
Attempt: I have tried using a string.split(range(0,5) ".")
. What would be the best way to do this?
CodePudding user response:
I don't usually reach for regular expressions first, but this cries out for re.split
.
parts = re.split(r'(\d\.)`, string)
This does need a bit of post-processing. It creates:
['', '1.', 'Adam-Lee-Dotnet-9191919191-AdamLee@gmail.com-London-UK-Oracle-Banking', '2.', 'Peter-Smith-Salesforce-9222291910-PeterSmith21@gmail.com-Mumbai-INDIA-Oracle-Engineering', ...
So you'll need to combine ever other element.
CodePudding user response:
You could split using a regex with lookaround assertions that assert 1 digits followed by a dot to the right using (?=\d \.)
and assert not the start of the string to the left using (?<!^)
(?<!^)(?=\d \.)
import re
pattern = r"(?<!^)(?=\d \.)"
string="1.Adam-Lee-Dotnet-9191919191-AdamLee@gmail.com-London-UK-Oracle-Banking2.Peter-Smith-Salesforce-9222291910-PeterSmith21@gmail.com-Mumbai-INDIA-Oracle-Engineering3.Harrison-Lu-Java-9223391910-HarrisonLu@gmail.com-Mumbai-INDIA-Samsung-Engineering4.Andrew-Joseph-Javascript-9200091910-AndrewJoseph@gmail.com-Toronto-CANADA-Dell-Engineering5.Larry-Ken-SQL-8880091910-LarryKen@gmail.com-Newyork-USA-HP-Management"
res = re.split(pattern, string)
print(res)
Output
[
'1.Adam-Lee-Dotnet-9191919191-AdamLee@gmail.com-London-UK-Oracle-Banking',
'2.Peter-Smith-Salesforce-9222291910-PeterSmith21@gmail.com-Mumbai-INDIA-Oracle-Engineering',
'3.Harrison-Lu-Java-9223391910-HarrisonLu@gmail.com-Mumbai-INDIA-Samsung-Engineering',
'4.Andrew-Joseph-Javascript-9200091910-AndrewJoseph@gmail.com-Toronto-CANADA-Dell-Engineering',
'5.Larry-Ken-SQL-8880091910-LarryKen@gmail.com-Newyork-USA-HP-Management'
]
Or instead of splitting, you could also use a pattern to match 1 or more digits followed by a dot, and then match until the first occurrence of the same pattern or the end of the string.
\d \..*?(?=\d \.|$)
import re
pattern = r"\d \..*?(?=\d \.|$)"
string="1.Adam-Lee-Dotnet-9191919191-AdamLee@gmail.com-London-UK-Oracle-Banking2.Peter-Smith-Salesforce-9222291910-PeterSmith21@gmail.com-Mumbai-INDIA-Oracle-Engineering3.Harrison-Lu-Java-9223391910-HarrisonLu@gmail.com-Mumbai-INDIA-Samsung-Engineering4.Andrew-Joseph-Javascript-9200091910-AndrewJoseph@gmail.com-Toronto-CANADA-Dell-Engineering5.Larry-Ken-SQL-8880091910-LarryKen@gmail.com-Newyork-USA-HP-Management"
res = re.findall(pattern, string)