I'm running Python 3.6.8. I need to sum values that appear in a log file. The line may contain 1 to 14 {index,value} pairs; a typical line for 8 values is in the code below(variable called 'log_line'). The line format with the '- -' separator is consistent. I have working code, but I'm not sure if this is the most elegant or best way to parse this string; it feels a bit clunky. Any suggestions?
import re
#verion 1
log_line = 'Some explanatory text was here: - -{0, 8} {1, 24} {2, 24} {3, 5} {4, 5} {5, 12} {6, 12} {7, 5}'
log_line_values = log_line.split('- -')[1]
values = re.findall(r'{\d ,\s\d }',log_line_values)
sum_of_values = 0
for v in values:
sum_of_values = int(v.replace('{','').replace('}','').replace(' ','').split(',')[1])
print(f'1) sum_of_values:{sum_of_values}')
#verions 2, essentially the same, but more concise (some may say confusing)
sum_of_values = sum([int(v.replace('{','').replace('}','').replace(' ','').split(',')[1]) for v in re.findall(r'{\d ,\s\d }',log_line.split('- -')[1])])
print(f'2) sum_of_values:{sum_of_values}')
CodePudding user response:
First, no need to get rid of the prefix - the regex will take care of not matching that. Second, we can use capturing groups to capture values that we only care about. In our case, the second value in a comma seperated pair. We can use map(int, iterable)
to turn every string to an int in a list, and then we can use sum on that list of numbers.
Putting it all together:
import re
log_line = 'Some explanatory text was here: - -{0, 8} {1, 24} {2, 24} {3, 5} {4, 5} {5, 12} {6, 12} {7, 5}'
values = re.findall(r'{\d ,\s(\d )}', log_line_values)
sum_of_values = sum(map(int, values))
CodePudding user response:
Ideal use case for regular expressions capture groups:
import re
log_line = 'Some explanatory text was here: - -{0, 8} {1, 24} {2, 24} {3, 5} {4, 5} {5, 12} {6, 12} {7, 5}'
pattern = r'{(\d ), (\d )}'
s = sum([int(e[1]) for e in re.findall(pattern, log_line.split('- -')[1])])
print(s) # 95
Here I use re.findall
to match numbers from input array and use list comprehension to convert them to numbers and sum.
The advantage of using {(\d ), (\d )}
pattern is the ability to extract first number too (if you need it).
CodePudding user response:
Assuming you've already identified that the line is one that matches the pattern, you can simplify your logic a lot by using a generator expression within sum().
import re
# Compile your regular expression for reuse
# Just pull out the last value from each pair
re_extract_val = re.compile(r'{\d , (\d )}')
log_line = 'Some explanatory text was here: - -{0, 8} {1, 24} {2, 24} {3, 5} {4, 5} {5, 12} {6, 12} {7, 5}'
# Use generator comprehension within sum() to add all values
sum_of_values = sum(int(val) for val in re_extract_val.findall(log_line))
You could also use map(), but I find it's clearer with a generator expression
sum_of_values = sum(map(int, re_extract_val.findall(log_line)))