Suppose that you have a string with a lot of numbers that are attached o very close to some characters like this:
string = "I have a cellphone with 4GB of ram and 64 GB of rom, My last computer had 4GB of ram and NASA only had 4KB when ... that's incredible"
and I wanted it to return:
[4GB, 64GB, 4GB, 4KB]
I'm trying
import re
def extract_gb(string):
gb = re.findall('[0-9] ',string)
return gb
extract_gb(string)
output [4, 64, 4, 4]
gives just the number as output, but it would like to get the number and the set of strings attached or close of it, I expect the output [4GB, 64GB, 4GB, 4KB]
I appreciate any kind of help.
CodePudding user response:
Use
r'\b[0-9] \s?[A-Za-z] \b'
'\b'
for word boundaries. Otherwise "M4M"
would also match, or "4M4"
.
There is also an optional whitespace, for something like "64 GB"
. I take it this is meant by "very close".
Note that this solution does not take into account non-English characters.