How to get the real number after a string in a file-CodePudding

I have files that contain both strings and floats. I am interested in finding the floats after a specific string. Any help in writing such a function that reads the file look for that specific string and returns the float after it will be much appreciated.

Thanks

An example of a file is

lines = """aaaaaaaaaaaaaaa  bbbbbbbbbbbbbbb  cccccccccc
qq vvv rrr ssssa 22.6
zzzzx bbbb 12.0
xxxxxxxxxx -1.099
zzzz bbb nnn 33.5"""

import re

lines = """aaaaaaaaaaaaaaa  bbbbbbbbbbbbbbb  cccccccccc
qq vvv rrr ssssa 22.6
zzzzx bbbb 12.0
xxxxxxxxxx -1.099
zzzz bbb nnn 33.5"""

str_to_search = 'xxxxxxxxxx'
num = re.findall(r'^'   str_to_search   r' (\d \.\d )', lines, flags=re.M)
print(num)

This works if there are no negative signs. In other words, if the number after the string 'xxxxxxxxxx' is 1.099 rather than '-1.099', it works fine. The question I have is how to generalize so it accounts for negative numbers as well given that it can be positive number (no sign in this case) or a negative number (with a negative sign in this case)

CodePudding user response：

You can use regex

(-?\d \.?\d*)

import re

lines = """aaaaaaaaaaaaaaa  bbbbbbbbbbbbbbb  cccccccccc
qq vvv rrr ssssa 22.6
zzzzx bbbb 12.0
xxxxxxxxxx -1.099
zzzz bbb nnn 33.5
xxxxxxxxxx 1.099"""

str_to_search = "xxxxxxxxxx"
num = re.findall(fr"(?m)^{str_to_search}\s (-?\d \.?\d*)", lines)
print(num)

Prints:

['-1.099', '1.099']

CodePudding user response：

You can change the regex to following:

num = re.findall(r'^'   str_to_search   r' (-?\d \.?\d*)', lines, flags=re.M)

CodePudding user response：

I would just split the entire filecontent at every space. This will give us a list of all strings and floats. Then use list.index(" ") to find the index of the string you are searching for, put that into try/except to make sure your code wont stop if the string is not in the contents. Then just read the next element and try to convert it to a float. Code:

lines = """aaaaaaaaaaaaaaa  bbbbbbbbbbbbbbb  cccccccccc
qq vvv rrr ssssa 22.6
zzzzx bbbb 12.0
xxxxxxxxxx -1.099
zzzz bbb nnn 33.5"""

lines = lines.replace("\n", " ").split(" ") # replace the newlines with spaces to split them as well

try:
    float_index = lines.index("xxxxxxxxxx")   1 # Get the element after the string you are trying to find

    num = float(lines[float_index])
except Exception as e:
    print(e)

print(num)

If you are looking for a solution in regex, use Andrej Kesely's awnser.