Single CSV output with data in different columns-CodePudding

I have a number of CSV files with data in the first three columns only. I want to copy data from each CSV file and paste it into one single CSV file in column order. For example data from the first CSV file goes into columns 1,2 and 3 in the output file. Similarly, data from the 2nd CSV goes to columns 4,5, and 6 of the same output CSV file and so on. Any help would be highly appreciated. Thanks.

I have tried the following code but it gets me the output in same columns only.

import glob
import pandas as pd
import time
import numpy as np
start = time.time()

Filename='Combined_Data.csv'

extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

for i in range(len(all_filenames)):
    data= pd.read_csv(all_filenames[i],skiprows=23)
    data= data.rename({'G1': 'CH1', 'G2': 'CH2','Dis': 'CH3'},axis=1) 
    data= data[['CH1','CH2','CH3']]
    data= data.apply(pd.to_numeric, errors='coerce')
    print(all_filenames[i])
    if i == 0:
    data.to_csv(Filename,sep=',',index=False,header=True,mode='a')
    else:
    data.to_csv(Filename,sep=',',index=False,header=False,mode='a')

end = time.time()
print((end - start),'Seconds(Execution Time)')

CodePudding user response：

If you don't need to write your own code for this, I'd recommend GoCSV's zip command; it can also handle the CSVs having different numbers of rows.

I have three CSV files:

file1.csv

Dig1,Dig2,Dig3
1,2,3
4,5,6
7,8,9

file2.csv

Letter1,Letter2,Letter3
a,b,c
d,e,f

and file3.csv

RomNum1,RomNum2,RomNum3
I,II,III

When I run gocsv zip file2.csv file1.csv file3.csv I get:

Letter1,Letter2,Letter3,Dig1,Dig2,Dig3,RomNum1,RomNum2,RomNum3
a,b,c,1,2,3,I,II,III
d,e,f,4,5,6,,,
,,,7,8,9,,,

GoCSV is pre-built for a number of different OS'es.

CodePudding user response：

Here's how to do it with Python's CSV module, using these files:

file1.csv

Dig1,Dig2,Dig3
1,2,3
4,5,6
7,8,9

file2.csv

Letter1,Letter2,Letter3
a,b,c
d,e,f

and file3.csv

RomNum1,RomNum2,RomNum3
I,II,III

The more-memory-intensive option

This accumulates the final CSV one file at a time, expanding a list that represents the final CSV with with each new input CSV.

#!/usr/bin/env python3
import csv
import sys

csv_files = [
    'file2.csv',
    'file1.csv',
    'file3.csv',
]

all = []

for csv_file in csv_files:
    with open(csv_file) as f:
        reader = csv.reader(f)
        rows = list(reader)

        len_all = len(all)

        # First file, initialize all and continue (skip)
        if len_all == 0:
            all = rows
            continue

        # The number of columns in all so far
        len_cols = len(all[0])

        # Extend all with the new rows
        for i, row in enumerate(rows):
            # Check to make sure all has as many rows as this file
            if i >= len_all:
                all.append(['']*len_cols)

            all[i].extend(row)


# Finally, pad all rows on the right
len_cols = len(all[0])
for i in range(len(all)):
    len_row = len(all[i])
    if len_row < len_cols:
        col_diff = len_cols - len_row
        all[i].extend(['']*col_diff)


writer = csv.writer(sys.stdout)
writer.writerows(all)

The streaming option

This reads-and-writes a line/row at a time.

(this is basically a Python port of the Go code from GoCSV's zip, from above)

import csv
import sys

fnames = [
    'file2.csv',
    'file1.csv',
    'file3.csv',
]
num_files = len(fnames)

readers = [csv.reader(open(x)) for x in fnames]

# Collect "header" lines; each header defines the number
# of columns for its file
headers = []
num_cols = 0
offsets = [0]
for reader in readers:
    header = next(reader)
    headers.append(header)
    num_cols  = len(header)
    offsets.append(num_cols)

writer = csv.writer(sys.stdout)

# With all headers counted, every row must have this many columns
shell_row = [''] * num_cols

for i, header in enumerate(headers):
    start = offsets[i]
    end = offsets[i 1]
    shell_row[start:end] = header

# Write headers
writer.writerow(shell_row)

# Expect that not all CSVs have the same number of rows; some will "finish" ahead of others
file_is_complete = [False] * num_files
num_complete = 0

# Loop a row at a time...
while True:
    # ... for each CSV
    for i, reader in enumerate(readers):
        if file_is_complete[i]:
            continue

        start = offsets[i]
        end = offsets[i 1]
        try:
            row = next(reader)
            # Put this row in its place in the main row
            shell_row[start:end] = row
        except StopIteration:
            file_is_complete[i] = True
            num_complete  = 1
        except:
            raise

    if num_complete == num_files:
        break

    # Done iterating CSVs (for this row), write it
    writer.writerow(shell_row)

    # Reset for next main row
    shell_row = [''] * num_cols

For either, I get:

Letter1,Letter2,Letter3,Dig1,Dig2,Dig3,RomNum1,RomNum2,RomNum3
a,b,c,1,2,3,I,II,III
d,e,f,4,5,6,,,
,,,7,8,9,,,