How can I split a number (which has decimal places) and letters at the end into a list with regular-CodePudding

I am processing some strings that look like:

0.08m
156.2km

Essentially each string is a number followed by units. I would like to use a regular expression to get the number part and the unit separately into a list.

I found this post which seems to only tackle the case where the number part has no decimals.

Any thoughts on how I could achieve this?

Note: I can't tell what the units will be before hand, but I know that they will only contain lower or uppercase letters.

CodePudding user response：

Use re.search

import re

data = ['0.08m', '100.5km']

for measure in data:
    match = re.search(r"([\d.] )(\D )", measure)
    print(match.groups())

Output

('0.08', 'm')
('100.5', 'km')

If in a single string, use re.finditer:

measures = '0.08m,100.5km'
for match in re.finditer(r"([\d.] )(\D )", measures):
    print(match.groups())

Output

('0.08', 'm,')
('100.5', 'km')

CodePudding user response：

You can use this regular expression: (^[0-9.] ) inside python split. It splits on anything that's not a digit or a dot.

Example

>>> import re
>>> s = '125.5km'
>>> re.split(r'([0-9.] )', s)
['', '125.5', 'km']

CodePudding user response：

If you are trying to match decimals, but not other values like IP addresses or version numbers, which have multiple periods, you can try this:

import re

# match either '123.456', '123', or '.456' format
regex = re.compile(r'^(\d \.\d |\d |\.\d )\w $')

units = [ "0.08m", "100.5km", "10kg", ".15l" ]
for unit in units:
    results = re.findall(regex, unit)
    print(unit, results)

badunits = [ "127.0.0.1ip", "1.2.3version", "4.5.6" ]
for unit in badunits:
    results = re.findall(regex, unit)
    print(unit, results)

## output
# 0.08m ['0.08']
# 100.5km ['100.5']
# 10kg ['10']
# .15l ['.15']
# 127.0.0.1ip []
# 1.2.3version []
4.5.6 []