Home > Net >  Merge multiple csv with same column headers PYTHON
Merge multiple csv with same column headers PYTHON

Time:09-18

Can I merge multiple csv files that have the same header in one csv using python? I would like to keep the same column headers but if i merge my files using the terminal it creates a file with repeating headers; so there's a way to do it in Python?

CodePudding user response:

i hope you find this helpful

import pandas as pd

# this two dataframes are just for showcasing the merge
# But can be replaced but any other valid DataFrame
# even more than 2 dataframes, just need to repeat this process

df1 = pd.DataFrame()
df1['col1'] = 1,2,3
# it even supports dataframes with not exactly the same columns
df1['col3'] = 4,4,8

df2 = pd.DataFrame()
df2['col1'] = 4,5,6
df2['col2'] = 1,4,8

# here df3 gets the resulting merged DataFrame
df3 = df1.append(df2)

# As you can see they get merged
# but as the df1 didnt had the "col2" its values are NaN
# Same with df2 and "col3"
>>   test1  col3  col2
0      1   1.0   NaN
1      2   4.0   NaN
2      3   8.0   NaN
0      4   NaN   4.0
1      5   NaN   4.0
2      6   NaN   8.0

# dumps dataframes to "file.csv" in the current folder
df3.to_csv("file.csv")

CodePudding user response:

Quickest way is to use csvkit in particular csvstack.

cat csv_1.csv                                                                                                                                                              
id,col1,col2,col3
1,'test','dog','cat'
2,'foo','fish','rabbit'

cat csv_2.csv                                                                                                                                                              
id,col1,col2,col3
3,'bar','owl','crow'
4,'spam','eel','cow'

# To stdout
csvstack csv_1.csv csv_2.csv

id,col1,col2,col3
1,'test','dog','cat'
2,'foo','fish','rabbit'
3,'bar','owl','crow'
4,'spam','eel','cow'



# To new file.
csvstack csv_1.csv csv_2.csv > csv_1_2.csv

Using the global library csv.

import csv

header = None
new_file = []
for f in ('csv_1.csv', 'csv_2.csv'):
    with open(f, newline='') as csv_file:
        reader = csv.reader(csv_file)
        if not header:
            new_file.append(reader.__next__())
            header = True
        else:
            reader.__next__()
        for row in reader:
            new_file.append(row)

with open('csv_new_file.csv', 'w', newline='') as csv_out:
    writer = csv.writer(csv_out)
    for row in new_file:
        writer.writerow(row)

cat csv_new_file.csv

id,col1,col2,col3
1,'test','dog','cat'
2,'foo','fish','rabbit'
3,'bar','owl','crow'
4,'spam','eel','cow'
  • Related