Home > database >  Extract digits from string with consecutive digit characters
Extract digits from string with consecutive digit characters

Time:11-22

I cannot use Regular Expressions or library :(. I need to extract all digits from an alphanumeric string. Each consecutive sequence of digits (we can call "temperature") is precluded by a ( , -, or *) and will be considered as a single number (all are integers, no float). There are other non digit characters in the string that can be ignored. I need to extract each "temperature" into a data structure.

Example String "BARN21 77-48CDAIRY87 56-12" yields [21, 77, 48, 87, 56, 12]

The data string can be many many magnitudes larger.

All solutions I can find assume there is only 1 sequence of digits (temperature) in the string or that the (temperatures) are separated by a space/delimiter. I was able to get working by iterating through string and adding a space before and after each digit sequence and then using split but that feels like cheating. I wonder if you professionals distort data for a happy solution??

incoming data "BARN21 77-48CDAIRY87 56-12" temp is what I change data to

temp = "BARN* 21   77 - 48 DAIRY* 87   56 - 12"
result = [int(i)
for i in temp.split()
    if i.isdigit()]
    print("The result ", result)

The result [21, 77, 48, 87, 56, 12]

CodePudding user response:

Here is a version which does not use regular expressions:

inp = "BARN21 77-48CDAIRY87 56-12"
inp = ''.join(' ' if not ch.isdigit() else ch for ch in inp).strip()
nums = inp.split()
print(nums)  # ['21', '77', '48', '87', '56', '12']

If regex be available for you, we can use re.findall with the regex pattern \d :

inp = "BARN21 77-48CDAIRY87 56-12"
nums = re.findall(r'\d ', inp)
print(nums)  # ['21', '77', '48', '87', '56', '12']
  • Related