I am processing some strings that look like:
- 0.08m
- 156.2km
Essentially each string is a number followed by units. I would like to use a regular expression to get the number part and the unit separately into a list.
I found this post which seems to only tackle the case where the number part has no decimals.
Any thoughts on how I could achieve this?
Note: I can't tell what the units will be before hand, but I know that they will only contain lower or uppercase letters.
CodePudding user response:
Use re.search
import re
data = ['0.08m', '100.5km']
for measure in data:
match = re.search(r"([\d.] )(\D )", measure)
print(match.groups())
Output
('0.08', 'm')
('100.5', 'km')
If in a single string, use re.finditer
:
measures = '0.08m,100.5km'
for match in re.finditer(r"([\d.] )(\D )", measures):
print(match.groups())
Output
('0.08', 'm,')
('100.5', 'km')
CodePudding user response:
You can use this regular expression: (^[0-9.] )
inside python split
. It splits on anything that's not a digit or a dot.
Example
>>> import re
>>> s = '125.5km'
>>> re.split(r'([0-9.] )', s)
['', '125.5', 'km']
CodePudding user response:
If you are trying to match decimals, but not other values like IP addresses or version numbers, which have multiple periods, you can try this:
import re
# match either '123.456', '123', or '.456' format
regex = re.compile(r'^(\d \.\d |\d |\.\d )\w $')
units = [ "0.08m", "100.5km", "10kg", ".15l" ]
for unit in units:
results = re.findall(regex, unit)
print(unit, results)
badunits = [ "127.0.0.1ip", "1.2.3version", "4.5.6" ]
for unit in badunits:
results = re.findall(regex, unit)
print(unit, results)
## output
# 0.08m ['0.08']
# 100.5km ['100.5']
# 10kg ['10']
# .15l ['.15']
# 127.0.0.1ip []
# 1.2.3version []
4.5.6 []