Home > database >  How to combine CSV files without using pandas
How to combine CSV files without using pandas

Time:01-18

I have 3 csv files that I need to merge

all the 3 files have the first three columns being equal while like name, age, sex but other columns are different for all.

I am new to python. I need assistance on this. I can comprehend any code written. Thanks

I have tried some codes but not working

file 1
firstname,secondname,age,address,postcode,height
gdsd,gas,uugd,gusa,uuh,hhuuw
kms,kkoil,jjka,kja,kaja,loj
iiow,uiuw,iue,oijw,uow,oiujw
ujis,oiiw,ywuq,sax,cxv,ywf

file 2
firstname,secondname,age,home-town,spousename,marital_staus
gdsd,gas,uugd,vbs,owu,nsvc
kms,kkoil,jjka,kja,kaja,loj
iiow,uiuw,iue,xxfaf,owuq,pler
ujis,oiiw,ywuq,gfhd,lzac,oqq

   file 3
firstname,secondname,age,drive,educated,
gdsd,gas,uugd,no,yes
kms,kkoil,jjka,no,no
iiow,uiuw,iue,yes,no
ujis,oiiw,ywuq,yes,yes

desired result

firstname,secondname,age,hometown,spousename,marital_status,adress,post_code,height,drive,educated

note that firstname,secondname,age is the same across the 3 tables

I need valid codes please

CodePudding user response:

You can use the csv module in Python to read and merge the CSV files without using pandas. Here's an example of how you can merge three CSV files with the same first three columns (name, age, sex) and different other columns:

import csv

# list to hold the contents of all csv files
data = []

# read the contents of each csv file
with open("file1.csv", "r") as f1:
    reader = csv.reader(f1)
    data  = [row for row in reader]

with open("file2.csv", "r") as f2:
    reader = csv.reader(f2)
    data  = [row for row in reader]

with open("file3.csv", "r") as f3:
    reader = csv.reader(f3)
    data  = [row for row in reader]

# write the merged contents to a new csv file
with open("merged_file.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(data)

P.s it will work only if all the CSV files have the same structure, the same columns and the same number of rows. Also, this code will not check if the data is unique or not.

CodePudding user response:

Here's a generic solution for concatenating CSV files that have heterogeneous headers with Python.

What you need to do first is to read the header of each CSV file for determining the "unique" field names.

Then, you just have to read each input record and output it after converting it to match the new header (composed of the unique fields).

#!/usr/bin/env python3
import csv

paths = [ 'file1.csv', 'file2.csv', 'file3.csv' ]

fieldnames = set()
for p in paths:
    with open(p,'r') as f:
        reader = csv.reader(f)
        fieldnames.update( next(reader) )

with open('combined.csv', 'w') as o:
    writer = csv.DictWriter(o, fieldnames = fieldnames)
    writer.writeheader()
    for p in paths:
        with open(p,'r') as f:
            reader = csv.DictReader(f)
            writer.writerows( reader )

remark: I open the files twice, so it won't work for inputs that are streams (for ex. sys.stdin)

  • Related