Home > database >  Extract columns from 3 CSVs in to 1 CSV (Python)
Extract columns from 3 CSVs in to 1 CSV (Python)

Time:08-24

I'm looking for a way to combine 3 CSVs that contain information in the columns D and E like this with the first row called X Days:

    D       E      F      G
1   3 Days  4 Days      
2   $100    $200        
3   $111    $222        
4   ...     ... 
5   ...     ... 

I want to combine the 3 CSVs in to one but leave a column blank between them like this:

    Data from 1st CSV:    Data from 2nd CSV:       Data from 3nd CSV:

    D       E       F     G        H        I      J       K
1   3 Days  4 Days        3 Days   4 Days          3 Days   4 Days
2   $100    $200          $300     $400            $500     $600
3   $111    $222          $333     $444            $555     $666
4    ...     ...           ...      ...             ...      ...
5    ...     ...           ...      ...             ...      ...

How can I combine them like this? (Without the "Data from X CSV")

CodePudding user response:

So, you just open all of your CSV files in parallel. As long as they keep feeding input, you create new rows on output. This code stops as soon as the FIRST CSV file ends. If you need to run until the longest file, that will take more work. You'd use this like:

python merge.py aaa.csv bbb.csv ccc.csv

You could use glob to allow wildcards.

import sys
import csv

fls = [csv.reader(open(f)) for f in sys.argv[1:]]
fout = csv.writer(open('out.csv','w'))

try:
    while True:
        newrow = []
        for f in fls:
            row = next(f)
            newrow.extend( row[3:5]   [''] )
        fout.writerow( newrow )
        print(newrow)
except StopIteration:
    print('Done')

CodePudding user response:

read all three csvs in to dataframes and then use pd.concat(...,axis=1) to concat them as new columns.

df1 = pd.read_csv(file1)
df2 = pd.read_csv(file2)
df3 = pd.read_csv(file3)

df1['F'] = '' #making those blank columns
df2['I'] = ''
df_final = pd.concat([df1,df2,df3],axis=1)
df_final.to_csv(filename)
  • Related