Sorry But I can't use pandas.
I have a sample input csv file that looks like this:
Alfa,Beta,Charlie,Delta,Echo,Foxtrot,Golf,Hotel,India,Juliett,Kilo
A1,B1,C1,D1,E1,F1,G1,H1,I1,J1,K1
A2,B2,C2,D2,E2,F2,G2,H2,I2,J2,K2
A3,B3,C3,D3,E3,F3,G3,H3,I3,J3,K3
A4,B4,C4,D4,E4,F4,G4,H4,I4,J4,K4
A5,B5,C5,D5,E1,F5,G5,H5,I5,J5,K5
A6,B6,C6,D6,E6,F6,G6,H6,I6,J6,K6
A7,B7,C7,D7,E7,F7,G7,H7,I7,J7,K7
A8,B8,C8,D8,E8,F8,G8,H8,I8,J8,K8
A9,B9,C9,D9,E9,F9,G9,H9,I9,J9,K9
Then my sample code looks like this:
import csv
fieldnames_dict = {
'Beta': 'Beta_New',
'Echo': 'Echo_New',
'Foxtrot': 'Foxtrot_New_ALL',
'Hotel': 'Hotel_New',
'India': 'India_New',
'Charlie': 'Charlie_New'
}
with open("book1.csv", "r", encoding="utf-8", errors='ignore') as csv_in:
with open("xtest_file.csv", "w", encoding="utf-8", errors='ignore') as csv_out:
reader = csv.DictReader(csv_in, delimiter=',', quotechar='"')
writer = csv.DictWriter(csv_out, delimiter=',', quotechar='"',
fieldnames=list(fieldnames_dict.values()))
writer.writeheader()
for row_in in reader:
row_out = {new: row_in[old] for old, new in fieldnames_dict.items()}
writer.writerow(row_out)
So this is what the output csv SHOULD look like:
Beta_New,Echo_New,Foxtrot_New_ALL,Tango,Victor,Hotel_New,India_New,Charlie_New
B1,E1,F1,T1,V1,H1,I1,C1
B2,E2,F2,T1,V1,H2,I2,C2
B3,E3,F3,T1,V1,H3,I3,C3
B4,E4,F4,T1,V1,H4,I4,C4
B5,E5,F5,T1,V1,H5,I5,C5
B6,E6,F6,T1,V1,H6,I6,C6
B7,E7,F7,T1,V1,H7,I7,C7
B8,E8,F8,T1,V1,H8,I8,C8
B9,E9,F9,T1,V1,H9,I9,C9
So all keys in a dictionary must be unique (so adding two empty string keys in in fieldnames_dict
is impossible). That said, all keys in fieldnames_dict
must already exist in input CSV file because of the row_in[old] in the dictionary comprehension {new: row_in[old] for old, new in fieldnames_dict.items()}
. This means I will have to append/add the new keys at the end, but how do I add the keys "Tango,Victor" and values in between "Foxtrot_New_ALL" and "Hotel_New"?
CodePudding user response:
I feel like you have misunderstood how DictWriter
works.
When you specify fieldnames=...
, DictWriter
doesn't care where those field names come from, as long as it's a list of strings.
When you pass a dictionary to writerow()
, the DictWriter
looks at the fieldnames
that you gave it earlier to decide which keys from that dict to select, and in what order.
So all that matters is that
fieldnames
is a list of column names in the order that you want the columns in your output file.- any dict you pass to
writerow
contains all of the fields specified infieldnames
. The column order is specified byfieldnames
, so the order the keys of this dict is irrelevant.
With this in mind, you can create a list containing your new column names that is separate from the dictionary that tells you how the old columns get renamed.
fields = ['Beta_New',
'Echo_New',
'Foxtrot_New_ALL',
'Tango',
'Victor',
'Hotel_New',
'India_New',
'Charlie_New']
column_rename_dict = {
'Beta': 'Beta_New',
'Echo': 'Echo_New',
'Foxtrot': 'Foxtrot_New_ALL',
'Hotel': 'Hotel_New',
'India': 'India_New',
'Charlie': 'Charlie_New'
}
Then, you use fields
as the fieldnames
argument to your writer, and make sure any dicts you pass to it contain all of these field names.
ctr = 1
with open(...) as csv_in, open(...) as csv_out:
reader = csv.DictReader(csv_in)
writer = csv.DictWriter(csv_out, fieldnames=fields)
writer.writeheader()
for row_in in reader:
# Convert the old field names to new field names
row_out = {new: row_in[old] for old, new in column_rename_dict.items()}
# Set values for extra keys
row_out["Tango"] = f"T{ctr}"
row_out["Victor"] = f"V{ctr}"
ctr = 1
# Write the row
writer.writerow(row_out)
Which gives you your new csv:
Beta_New,Echo_New,Foxtrot_New_ALL,Tango,Victor,Hotel_New,India_New,Charlie_New
B1,E1,F1,T1,V1,H1,I1,C1
B2,E2,F2,T2,V2,H2,I2,C2
B3,E3,F3,T3,V3,H3,I3,C3
B4,E4,F4,T4,V4,H4,I4,C4
B5,E1,F5,T5,V5,H5,I5,C5
B6,E6,F6,T6,V6,H6,I6,C6
B7,E7,F7,T7,V7,H7,I7,C7
B8,E8,F8,T8,V8,H8,I8,C8
B9,E9,F9,T9,V9,H9,I9,C9