Home > Software design >  How do I use regex to find multiple matches in the same string?
How do I use regex to find multiple matches in the same string?

Time:05-03

I am trying to use regex to extract data from a csv file with the following format

 2022-May-02 00:52:59.773,*,m, 22 57 44.81, -08 18 03.1,    0.835,  4.256,  1.62295918340127, -9.6380650,   58.8566,/L,   38.0538,   1.8729490,  67.895361,  -14.70063,         n.a.,    n.a.,
 2022-May-02 00:57:59.773,*,m, 22 57 45.40, -08 17 59.6,    0.836,  4.257,  1.62293986289045, -9.6316459,   58.8574,/L,   38.0543,   1.8732214,  67.896937,  -14.68939,         n.a.,    n.a.,
 2022-May-02 01:02:59.773,*,m, 22 57 45.98, -08 17 56.0,    0.838,  4.258,  1.62292055513269, -9.6253460,   58.8581,/L,   38.0547,   1.8735000,  67.898620,  -14.67828,         n.a.,    n.a.

From this, I want to extract the following groups (columns 1, 4 and 5)

2022-May-02 00:52:59.773, 22, 57, 44.81, -08, 18, 03.1 
2022-May-02 00:57:59.773, 22, 57, 45.40, -08, 17, 59.6
2022-May-02 01:02:59.773, 22, 57, 45.98, -08, 17, 56.0

The pattern I am currently using is pattern = "^\s?([^,] ),[^,] ,[^,] , (\-*\d ) (\d ) (\d \.\d ), (\-*\d ) (\d ) (\d \.\d )"

and re.match(pattern, table, flags=re.MULTILINE)

From this, I only get a match with the first line of the csv file.

print(results.groups()) 
>> ('2022-May-02 01:46:35.916', '22', '57', '44.81', '-08', '18', '03.1')

What do I need to do to match the same pattern on multiple lines?

CodePudding user response:

Since you are using CSV file, don't use REGEX, instead use the inbuildcsv library, with it you will be able to extract any column with ease.

import csv

with open('csv_file.csv', 'r') as input:
    input_reader = csv.reader(input)
    for line in input_reader:
        print(line[0], line[3], line[4])

#2022-May-02 00:52:59.773, 22 57 44.81, -08 18 03.1 
#2022-May-02 00:57:59.773, 22 57 45.40, -08 17 59.6
#2022-May-02 01:02:59.773, 22 57 45.98, -08 17 56.0

CodePudding user response:

You would be best to avoid a regular expression and use Python's CSV library. The last 6 values you want are inside columns 4 and 5 separated by spaces. So you will need to split those out separately. For example:

import csv

with open('input.csv') as f_input, open('output.csv', 'w', newline='') as f_output:
    csv_input = csv.reader(f_input, skipinitialspace=True)
    csv_output = csv.writer(f_output)
    
    for row in csv_input:
        csv_output.writerow([row[0], *row[3].split(' '), *row[4].split(' ')])

This would create an output.csv file containing:

2022-May-02 00:52:59.773,22,57,44.81,-08,18,03.1
2022-May-02 00:57:59.773,22,57,45.40,-08,17,59.6
2022-May-02 01:02:59.773,22,57,45.98,-08,17,56.0
  • Related