I am trying to use regex to extract data from a csv file with the following format
2022-May-02 00:52:59.773,*,m, 22 57 44.81, -08 18 03.1, 0.835, 4.256, 1.62295918340127, -9.6380650, 58.8566,/L, 38.0538, 1.8729490, 67.895361, -14.70063, n.a., n.a.,
2022-May-02 00:57:59.773,*,m, 22 57 45.40, -08 17 59.6, 0.836, 4.257, 1.62293986289045, -9.6316459, 58.8574,/L, 38.0543, 1.8732214, 67.896937, -14.68939, n.a., n.a.,
2022-May-02 01:02:59.773,*,m, 22 57 45.98, -08 17 56.0, 0.838, 4.258, 1.62292055513269, -9.6253460, 58.8581,/L, 38.0547, 1.8735000, 67.898620, -14.67828, n.a., n.a.
From this, I want to extract the following groups (columns 1, 4 and 5)
2022-May-02 00:52:59.773, 22, 57, 44.81, -08, 18, 03.1
2022-May-02 00:57:59.773, 22, 57, 45.40, -08, 17, 59.6
2022-May-02 01:02:59.773, 22, 57, 45.98, -08, 17, 56.0
The pattern I am currently using is pattern = "^\s?([^,] ),[^,] ,[^,] , (\-*\d ) (\d ) (\d \.\d ), (\-*\d ) (\d ) (\d \.\d )"
and
re.match(pattern, table, flags=re.MULTILINE)
From this, I only get a match with the first line of the csv file.
print(results.groups())
>> ('2022-May-02 01:46:35.916', '22', '57', '44.81', '-08', '18', '03.1')
What do I need to do to match the same pattern on multiple lines?
CodePudding user response:
Since you are using CSV file, don't use REGEX, instead use the inbuildcsv
library, with it you will be able to extract any column with ease.
import csv
with open('csv_file.csv', 'r') as input:
input_reader = csv.reader(input)
for line in input_reader:
print(line[0], line[3], line[4])
#2022-May-02 00:52:59.773, 22 57 44.81, -08 18 03.1
#2022-May-02 00:57:59.773, 22 57 45.40, -08 17 59.6
#2022-May-02 01:02:59.773, 22 57 45.98, -08 17 56.0
CodePudding user response:
You would be best to avoid a regular expression and use Python's CSV library. The last 6 values you want are inside columns 4 and 5 separated by spaces. So you will need to split those out separately. For example:
import csv
with open('input.csv') as f_input, open('output.csv', 'w', newline='') as f_output:
csv_input = csv.reader(f_input, skipinitialspace=True)
csv_output = csv.writer(f_output)
for row in csv_input:
csv_output.writerow([row[0], *row[3].split(' '), *row[4].split(' ')])
This would create an output.csv
file containing:
2022-May-02 00:52:59.773,22,57,44.81,-08,18,03.1
2022-May-02 00:57:59.773,22,57,45.40,-08,17,59.6
2022-May-02 01:02:59.773,22,57,45.98,-08,17,56.0