I have 3 csv files that I need to merge
all the 3 files have the first three columns being equal while like name, age, sex but other columns are different for all.
I am new to python. I need assistance on this. I can comprehend any code written. Thanks
I have tried some codes but not working
file 1
firstname,secondname,age,address,postcode,height
gdsd,gas,uugd,gusa,uuh,hhuuw
kms,kkoil,jjka,kja,kaja,loj
iiow,uiuw,iue,oijw,uow,oiujw
ujis,oiiw,ywuq,sax,cxv,ywf
file 2
firstname,secondname,age,home-town,spousename,marital_staus
gdsd,gas,uugd,vbs,owu,nsvc
kms,kkoil,jjka,kja,kaja,loj
iiow,uiuw,iue,xxfaf,owuq,pler
ujis,oiiw,ywuq,gfhd,lzac,oqq
file 3
firstname,secondname,age,drive,educated,
gdsd,gas,uugd,no,yes
kms,kkoil,jjka,no,no
iiow,uiuw,iue,yes,no
ujis,oiiw,ywuq,yes,yes
desired result
firstname,secondname,age,hometown,spousename,marital_status,adress,post_code,height,drive,educated
note that firstname,secondname,age
is the same across the 3 tables
I need valid codes please
CodePudding user response:
You can use the csv module in Python to read and merge the CSV files without using pandas. Here's an example of how you can merge three CSV files with the same first three columns (name, age, sex) and different other columns:
import csv
# list to hold the contents of all csv files
data = []
# read the contents of each csv file
with open("file1.csv", "r") as f1:
reader = csv.reader(f1)
data = [row for row in reader]
with open("file2.csv", "r") as f2:
reader = csv.reader(f2)
data = [row for row in reader]
with open("file3.csv", "r") as f3:
reader = csv.reader(f3)
data = [row for row in reader]
# write the merged contents to a new csv file
with open("merged_file.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(data)
P.s it will work only if all the CSV files have the same structure, the same columns and the same number of rows. Also, this code will not check if the data is unique or not.
CodePudding user response:
Here's a generic solution for concatenating CSV files that have heterogeneous headers with Python.
What you need to do first is to read the header of each CSV file for determining the "unique" field names.
Then, you just have to read each input record and output it after converting it to match the new header (composed of the unique fields).
#!/usr/bin/env python3
import csv
paths = [ 'file1.csv', 'file2.csv', 'file3.csv' ]
fieldnames = set()
for p in paths:
with open(p,'r') as f:
reader = csv.reader(f)
fieldnames.update( next(reader) )
with open('combined.csv', 'w') as o:
writer = csv.DictWriter(o, fieldnames = fieldnames)
writer.writeheader()
for p in paths:
with open(p,'r') as f:
reader = csv.DictReader(f)
writer.writerows( reader )
remark: I open the files twice, so it won't work for inputs that are streams (for ex. sys.stdin
)