I am trying to compare between 2 CSV files and if one of the columns' values match, then I will need to replace the values of the other csv file with the second csv file's values.
Example:
Book1.csv
:
Alfa,Beta,Charlie,Delta,Echo,Foxtrot,Golf,Hotel,India,Juliett,Kilo
A1,B1,C1,D1,E1,F1,G1,H1,I1,J1,
A2,B2,C2,D2,E2,F2,G2,H2,I2,J2,1
A3,B3,C3,D3,E3,F3,G3,H3,I3,J3,
A4,B4,C4,D4,E4,F4,G4,H4,I4,J4,
A5,B5,C5,D5,E1,F5,G5,H5,I5,J5,1
A6,B6,C6,D6,E6,F6,G6,H6,I6,J6,1
A7,B7,C7,D7,E7,F7,G7,H7,I7,J7,
A8,B8,C8,D8,E8,F8,G8,H8,I8,J8,1
A9,B9,C9,D9,E9,F9,G9,H9,I9,J9,
Book2.csv
:
Oscar,Papa,Lima
A1_x,B1,K2
A2,B2,X2
A3_x,B3,L2
A4_x,B4,K2
A5,B5,J2
A6,B6,A2
A7_x,B7,AS
A8,B8,S3
A9_x,B9,S1
If Book2.csv
column "Oscar"'s values are equal to Book1.csv
"Alfa"'s values (A2==A2
, A3!=A3
, A4==A4
), then Book2.csv
's "Lima" values will overwrite whatever is in Book1.csv
's "Beta" values.
So the output for Beta_New
will look like this (it looks like this because of the code below where I am switching and calling the columns by a dictionary:
xtest_file.csv
:
Beta_New,Echo_New,Foxtrot_New_ALL,Hotel_New,India_New,Charlie_New
X2,E2,F2,H2,I2,C2
J2,E5,F5,H5,I5,C5
A2,E6,F6,H6,I6,C6
S3,E8,F8,H8,I8,C8
My code so far:
import csv
fieldnames_dict = {
'Beta': 'Beta_New',
'Echo': 'Echo_New',
'Foxtrot': 'Foxtrot_New_ALL',
'Hotel': 'Hotel_New',
'India': 'India_New',
'Charlie': 'Charlie_New'
}
1_open_cd_csv = open("book1.csv", "r", encoding="utf-8", errors='ignore')
1_reader_cd_csv = csv.DictReader(1_open_cd_csv, delimiter=',', quotechar='"')
1_header_csv = next(1_reader_cd_csv)
2_open_cd_csv = open("book2.csv", "r", encoding="utf-8", errors='ignore')
2_reader_cd_csv = csv.DictReader(2_open_cd_csv, delimiter=',', quotechar='"')
2_header_csv = next(2_reader_cd_csv)
open_output_test_csv = open(xtest_file.csv, "w", encoding="utf-8", errors='ignore')
writer_output_test_csv = csv.DictWriter(open_output_test_csv, delimiter=',',
quotechar='"', quoting=csv.QUOTE_ALL,
fieldnames=list(fieldnames_dict.values()))
writer_output_test_csv.writeheader()
for row_in in 1_reader_cd_csv:
if row_in['Kilo'] == "1":
row_out = {new: row_in[old] for old, new in fieldnames_dict.items()}
writer.writerow(row_out)
#if 2_reader_cd_csv[Oscar]'s values = 1_reader_cd_csv[Alfa]'s values:
#then the output for "Beta_New" = the values for 2_reader_cd_csv[Lima]
or mabye something like this:
for row_in in 1_reader_cd_csv:
alfa_match = 2_reader_cd_csv['Oscar'] matches with = row_in['Alfa']
if alfa_match != none and row_in['Kilo'] == "1":
row_out = {new: row_in[old] for old, new in fieldnames_dict.items()}
#then the output for "Beta_New" = the values for 2_reader_cd_csv[Lima]
writer.writerow(row_out)
CodePudding user response:
Break your code into smaller chunks for each part:
- Read rows of "book1.csv" where Kilo="1" to
file1
- Read rows of "book2.csv" to
file2
- Replace values of "Beta" in
file1
based onfile2
values of Oscar - Write back to a new csv file
with open("book1.csv", "r") as infile:
reader = csv.DictReader(infile)
file1 = [row for row in reader if row["Kilo"]=='1']
with open("book2.csv") as infile:
reader = csv.DictReader(infile)
file2 = [row for row in reader]
output = list()
oscars = [row["Oscar"] for row in file2]
for row in file1:
if row["Alfa"] in oscars:
row["Beta"] = [r["Lima"] for r in file2 if r["Oscar"]==row["Alfa"]][0]
output.append({new:row[old] for old,new in fieldnames_dict.items()})
with open("output.csv", "w", newline="") as outfile:
writer = csv.DictWriter(outfile,fieldnames=list(fieldnames_dict.values()))
writer.writeheader()
for row in output:
writer.writerow(row)
output.csv:
Beta_New,Echo_New,Foxtrot_New_ALL,Hotel_New,India_New,Charlie_New
X2,E2,F2,H2,I2,C2
J2,E1,F5,H5,I5,C5
A2,E6,F6,H6,I6,C6
S3,E8,F8,H8,I8,C8
CodePudding user response:
If Book2.csv column "Oscar"'s values are equal to Book1.csv "Alfa"'s values (A2==A2, A3!=A3, A4==A4), then Book2.csv's "Lima" values will overwrite whatever is in Book1.csv's "Beta" values.
You might want to give pandas
a try:
import pandas as pd
#read csvs as dataframes
df1 = pd.read_csv("book1.csv")
df2 = pd.read_csv("book2.csv")
#replace 'Beta' in first df with the value in 'Lima' where 'Alfa' matches 'Oscar'.
df1['Beta'].where(df1['Alfa'] != df2['Oscar'], df2['Lima'], inplace=True)
#store as csv
df1.to_csv('new_file.csv')
Output df1:
Alfa | Beta | Charlie | Delta | Echo | Foxtrot | Golf | Hotel | India | Juliett | Kilo | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | A1 | B1 | C1 | D1 | E1 | F1 | G1 | H1 | I1 | J1 | nan |
1 | A2 | X2 | C2 | D2 | E2 | F2 | G2 | H2 | I2 | J2 | 1 |
2 | A3 | B3 | C3 | D3 | E3 | F3 | G3 | H3 | I3 | J3 | nan |
3 | A4 | B4 | C4 | D4 | E4 | F4 | G4 | H4 | I4 | J4 | nan |
4 | A5 | J2 | C5 | D5 | E1 | F5 | G5 | H5 | I5 | J5 | 1 |
5 | A6 | A2 | C6 | D6 | E6 | F6 | G6 | H6 | I6 | J6 | 1 |
6 | A7 | B7 | C7 | D7 | E7 | F7 | G7 | H7 | I7 | J7 | nan |
7 | A8 | S3 | C8 | D8 | E8 | F8 | G8 | H8 | I8 | J8 | 1 |
8 | A9 | B9 | C9 | D9 | E9 | F9 | G9 | H9 | I9 | J9 | nan |