Home > front end >  How to extract numbers attached to a set of characters in Python
How to extract numbers attached to a set of characters in Python

Time:01-05

Suppose that you have a string with a lot of numbers that are attached o very close to some characters like this:

string = "I have a cellphone with 4GB of ram and 64 GB of rom, My last computer had 4GB of ram and NASA only had 4KB when ... that's incredible"

and I wanted it to return:

[4GB, 64GB, 4GB, 4KB]

I'm trying

import re
def extract_gb(string):
    gb = re.findall('[0-9] ',string)
    return gb

extract_gb(string)

output [4, 64, 4, 4]

gives just the number as output, but it would like to get the number and the set of strings attached or close of it, I expect the output [4GB, 64GB, 4GB, 4KB]

I appreciate any kind of help.

CodePudding user response:

Use

r'\b[0-9] \s?[A-Za-z] \b'

'\b' for word boundaries. Otherwise "M4M" would also match, or "4M4".

There is also an optional whitespace, for something like "64 GB". I take it this is meant by "very close".

Note that this solution does not take into account non-English characters.

  • Related