Home > OS >  Any suggestions to improve Python string parsing
Any suggestions to improve Python string parsing

Time:03-31

I'm running Python 3.6.8. I need to sum values that appear in a log file. The line may contain 1 to 14 {index,value} pairs; a typical line for 8 values is in the code below(variable called 'log_line'). The line format with the '- -' separator is consistent. I have working code, but I'm not sure if this is the most elegant or best way to parse this string; it feels a bit clunky. Any suggestions?

    import re
    
    #verion 1
    log_line = 'Some explanatory text was here:      - -{0, 8} {1, 24} {2, 24} {3, 5} {4, 5} {5, 12} {6, 12} {7, 5}'
    log_line_values = log_line.split('- -')[1]
    values = re.findall(r'{\d ,\s\d }',log_line_values)
    sum_of_values = 0
    for v in values:
        sum_of_values  = int(v.replace('{','').replace('}','').replace(' ','').split(',')[1])
    print(f'1) sum_of_values:{sum_of_values}')

    #verions 2, essentially the same, but more concise (some may say confusing)
    sum_of_values = sum([int(v.replace('{','').replace('}','').replace(' ','').split(',')[1]) for v in re.findall(r'{\d ,\s\d }',log_line.split('- -')[1])])
    print(f'2) sum_of_values:{sum_of_values}')

CodePudding user response:

First, no need to get rid of the prefix - the regex will take care of not matching that. Second, we can use capturing groups to capture values that we only care about. In our case, the second value in a comma seperated pair. We can use map(int, iterable) to turn every string to an int in a list, and then we can use sum on that list of numbers.

Putting it all together:

import re

log_line = 'Some explanatory text was here:      - -{0, 8} {1, 24} {2, 24} {3, 5} {4, 5} {5, 12} {6, 12} {7, 5}'
values = re.findall(r'{\d ,\s(\d )}', log_line_values)
sum_of_values = sum(map(int, values))

CodePudding user response:

Ideal use case for regular expressions capture groups:

import re

log_line = 'Some explanatory text was here:      - -{0, 8} {1, 24} {2, 24} {3, 5} {4, 5} {5, 12} {6, 12} {7, 5}'
pattern = r'{(\d ), (\d )}'

s = sum([int(e[1]) for e in re.findall(pattern, log_line.split('- -')[1])])

print(s) # 95

Here I use re.findall to match numbers from input array and use list comprehension to convert them to numbers and sum.

The advantage of using {(\d ), (\d )} pattern is the ability to extract first number too (if you need it).

CodePudding user response:

Assuming you've already identified that the line is one that matches the pattern, you can simplify your logic a lot by using a generator expression within sum().

import re

# Compile your regular expression for reuse
# Just pull out the last value from each pair
re_extract_val = re.compile(r'{\d , (\d )}')

log_line = 'Some explanatory text was here:      - -{0, 8} {1, 24} {2, 24} {3, 5} {4, 5} {5, 12} {6, 12} {7, 5}'

# Use generator comprehension within sum() to add all values
sum_of_values = sum(int(val) for val in re_extract_val.findall(log_line))

You could also use map(), but I find it's clearer with a generator expression

sum_of_values = sum(map(int, re_extract_val.findall(log_line)))
  • Related