Home > Software engineering >  Splitting text and numbers and adding a seperator
Splitting text and numbers and adding a seperator

Time:11-30

I have a string of the form:

Abu Dhabi1.90Morrisville Samp Army1.90
Deccan Gladiators1.40The Chennai Braves2.87
Bangla Tigers1.90Delhi Bulls1.90
New Zealand1.68India2.15
Australia1.09Draw14.00West Indies13.00
Sri Lanka1.51Afghanistan2.50
Tas Tigers1.28South Australia3.50

Is there a regular expression that can be used so that the final output looks like

Abu Dhabi , 1.90 ,Morrisville Samp Army,1.90
Deccan Gladiators, 1.40,The Chennai Braves,2.87
Bangla Tigers, 1.90, Delhi Bulls, 1.90
New Zealand, 1.68, India, 2.15
Australia, 1.09, Draw, 14.00, West Indies, 13.00
Sri Lanka, 1.51, Afghanistan, 2.50
Tas Tigers, 1.28, South Australia, 3.50

CodePudding user response:

What about using (?<=[\d.])(?=[^\d.\n])|(?<=[^\d.])(?=[\d.]) to detect alternating numbers/non-numbers?

text = '''Abu Dhabi1.90Morrisville Samp Army1.90
Deccan Gladiators1.40The Chennai Braves2.87
Bangla Tigers1.90Delhi Bulls1.90
New Zealand1.68India2.15
Australia1.09Draw14.00West Indies13.00
Sri Lanka1.51Afghanistan2.50
Tas Tigers1.28South Australia3.50'''

print(re.sub('(?<=[\d.])(?=[^\d.\n])|(?<=[^\d.])(?=[\d.])', ', ', text))

Output:

Abu Dhabi, 1.90, Morrisville Samp Army, 1.90
Deccan Gladiators, 1.40, The Chennai Braves, 2.87
Bangla Tigers, 1.90, Delhi Bulls, 1.90
New Zealand, 1.68, India, 2.15
Australia, 1.09, Draw, 14.00, West Indies, 13.00
Sri Lanka, 1.51, Afghanistan, 2.50
Tas Tigers, 1.28, South Australia, 3.50

regex demo

CodePudding user response:

You can use an alternation pattern to match either consecutive alphabets and spaces followed by a digit, or consecutive digits and dots followed by an alphabet, and substitute the match with itself followed by a comma and a space:

import re

s = '''Abu Dhabi1.90Morrisville Samp Army1.90
Deccan Gladiators1.40The Chennai Braves2.87
Bangla Tigers1.90Delhi Bulls1.90
New Zealand1.68India2.15
Australia1.09Draw14.00West Indies13.00
Sri Lanka1.51Afghanistan2.50
Tas Tigers1.28South Australia3.50'''

print(re.sub(r'([A-Za-z ] (?=\d)|[\d.] (?=[A-Za-z]))', r'\1, ', s))

This outputs:

Abu Dhabi, 1.90, Morrisville Samp Army, 1.90
Deccan Gladiators, 1.40, The Chennai Braves, 2.87
Bangla Tigers, 1.90, Delhi Bulls, 1.90
New Zealand, 1.68, India, 2.15
Australia, 1.09, Draw, 14.00, West Indies, 13.00
Sri Lanka, 1.51, Afghanistan, 2.50
Tas Tigers, 1.28, South Australia, 3.50

Demo: https://replit.com/@blhsing/NewInternationalGravity

  • Related