I was wondering if someone could point me in the right direction.
Here is a sample of my data.
TRANS,"GUS000017787609","","","INSTL","","","","","","",,"","",20211025,
MTPNT,"",45654,"","","","","",,,
ASSET,"","INSTL","METER","","CR","G4SZV-2","FLN",2020,"XXXTYU422000","32","","LI"
I need to somehow turn this kind of information in a CSV using python. I've got thousands of rows of data and each TRANS,MTPNT and ASSET is considered one "row".
Does anyone know what the best type of technique would be to preform ETL on data of this kind?
CodePudding user response:
You could use the grouper
recipe to read 3 CSV lines at a time and combine them. For example:
import csv
from itertools import zip_longest, chain
def grouper(iterable, n, fillvalue=None):
"Collect data into non-overlapping fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
with open('input.csv') as f_input, open('output.csv', 'w', newline='') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
for triple_row in grouper(csv_input, 3):
row = list(chain.from_iterable(triple_row))
#row[2] = 'test' # modify 3rd value before writing
csv_output.writerow(row)
Giving you:
TRANS,GUS000017787609,,,INSTL,,,,,,,,,,20211025,,MTPNT,,45654,,,,,,,,,ASSET,,INSTL,METER,,CR,G4SZV-2,FLN,2020,XXXTYU422000,32,,LI