I have files that contain both strings and floats. I am interested in finding the floats after a specific string. Any help in writing such a function that reads the file look for that specific string and returns the float after it will be much appreciated.
Thanks
An example of a file is
lines = """aaaaaaaaaaaaaaa bbbbbbbbbbbbbbb cccccccccc
qq vvv rrr ssssa 22.6
zzzzx bbbb 12.0
xxxxxxxxxx -1.099
zzzz bbb nnn 33.5"""
import re
lines = """aaaaaaaaaaaaaaa bbbbbbbbbbbbbbb cccccccccc
qq vvv rrr ssssa 22.6
zzzzx bbbb 12.0
xxxxxxxxxx -1.099
zzzz bbb nnn 33.5"""
str_to_search = 'xxxxxxxxxx'
num = re.findall(r'^' str_to_search r' (\d \.\d )', lines, flags=re.M)
print(num)
This works if there are no negative signs. In other words, if the number after the string 'xxxxxxxxxx' is 1.099 rather than '-1.099', it works fine. The question I have is how to generalize so it accounts for negative numbers as well given that it can be positive number (no sign in this case) or a negative number (with a negative sign in this case)
CodePudding user response:
You can use regex
(-?\d \.?\d*)
import re
lines = """aaaaaaaaaaaaaaa bbbbbbbbbbbbbbb cccccccccc
qq vvv rrr ssssa 22.6
zzzzx bbbb 12.0
xxxxxxxxxx -1.099
zzzz bbb nnn 33.5
xxxxxxxxxx 1.099"""
str_to_search = "xxxxxxxxxx"
num = re.findall(fr"(?m)^{str_to_search}\s (-?\d \.?\d*)", lines)
print(num)
Prints:
['-1.099', '1.099']
CodePudding user response:
You can change the regex to following:
num = re.findall(r'^' str_to_search r' (-?\d \.?\d*)', lines, flags=re.M)
CodePudding user response:
I would just split the entire filecontent at every space. This will give us a list of all strings and floats. Then use list.index(" ") to find the index of the string you are searching for, put that into try/except to make sure your code wont stop if the string is not in the contents. Then just read the next element and try to convert it to a float. Code:
lines = """aaaaaaaaaaaaaaa bbbbbbbbbbbbbbb cccccccccc
qq vvv rrr ssssa 22.6
zzzzx bbbb 12.0
xxxxxxxxxx -1.099
zzzz bbb nnn 33.5"""
lines = lines.replace("\n", " ").split(" ") # replace the newlines with spaces to split them as well
try:
float_index = lines.index("xxxxxxxxxx") 1 # Get the element after the string you are trying to find
num = float(lines[float_index])
except Exception as e:
print(e)
print(num)
If you are looking for a solution in regex, use Andrej Kesely's awnser.