Can I merge multiple csv files that have the same header in one csv using python? I would like to keep the same column headers but if i merge my files using the terminal it creates a file with repeating headers; so there's a way to do it in Python?
CodePudding user response:
i hope you find this helpful
import pandas as pd
# this two dataframes are just for showcasing the merge
# But can be replaced but any other valid DataFrame
# even more than 2 dataframes, just need to repeat this process
df1 = pd.DataFrame()
df1['col1'] = 1,2,3
# it even supports dataframes with not exactly the same columns
df1['col3'] = 4,4,8
df2 = pd.DataFrame()
df2['col1'] = 4,5,6
df2['col2'] = 1,4,8
# here df3 gets the resulting merged DataFrame
df3 = df1.append(df2)
# As you can see they get merged
# but as the df1 didnt had the "col2" its values are NaN
# Same with df2 and "col3"
>> test1 col3 col2
0 1 1.0 NaN
1 2 4.0 NaN
2 3 8.0 NaN
0 4 NaN 4.0
1 5 NaN 4.0
2 6 NaN 8.0
# dumps dataframes to "file.csv" in the current folder
df3.to_csv("file.csv")
CodePudding user response:
Quickest way is to use csvkit in particular csvstack.
cat csv_1.csv
id,col1,col2,col3
1,'test','dog','cat'
2,'foo','fish','rabbit'
cat csv_2.csv
id,col1,col2,col3
3,'bar','owl','crow'
4,'spam','eel','cow'
# To stdout
csvstack csv_1.csv csv_2.csv
id,col1,col2,col3
1,'test','dog','cat'
2,'foo','fish','rabbit'
3,'bar','owl','crow'
4,'spam','eel','cow'
# To new file.
csvstack csv_1.csv csv_2.csv > csv_1_2.csv
Using the global library csv.
import csv
header = None
new_file = []
for f in ('csv_1.csv', 'csv_2.csv'):
with open(f, newline='') as csv_file:
reader = csv.reader(csv_file)
if not header:
new_file.append(reader.__next__())
header = True
else:
reader.__next__()
for row in reader:
new_file.append(row)
with open('csv_new_file.csv', 'w', newline='') as csv_out:
writer = csv.writer(csv_out)
for row in new_file:
writer.writerow(row)
cat csv_new_file.csv
id,col1,col2,col3
1,'test','dog','cat'
2,'foo','fish','rabbit'
3,'bar','owl','crow'
4,'spam','eel','cow'