I already looked at: How to combine 2 csv files with common column value, but both files have different number of lines and: Merging two CSV files using Python But both did not give the desired output I needed.
I have two csv files with the below data:
First file is - data1.csv
Name Dept Company
John Smith candy lead
Diana Princ candy lead
Perry Plat wood lead
Jerry Springer clothes lead
Calvin Klein clothes lead
Lincoln Tun warehouse lead
Oliver Twist kitchen lead
Second file is - data2.csv
Name Dept Company
John Smith candy lead
Tyler Perry candy lead
Perry Plat wood lead
Mary Poppins clothes lead
Calvin Klein clothes lead
Lincoln Tun warehouse lead
Herman Sherman kitchen lead
Jerry Springer clothes lead
Ivan Evans clothes lead
I want to merge them as one file, called newdata.csv
, sorting the Dept
column into groups and dropping the Company
column. The final output would look something like this:
Name Dept
John Smith candy
Diana Princ candy
Tyler Perry candy
Perry Plat wood
Jerry Springer clothes
Calvin Klein clothes
Mary Poppins clothes
Ivan Evans clothes
Lincoln Tun warehouse
Oliver Twist kitchen
Herman Sherman kitchen
I tried to use the merge function, but the output wasn't what I needed.
This is my code so far:
import pandas as pd
import os, csv, sys
csvPath1 = 'data1.csv'
csvPath2 = 'data2.csv'
csvDest = 'newdata.csv'
df1 = pd.read_csv(csvPath1)
df2 = pd.read_csv(csvPath2)
df1=df1.drop('Company', 1)
df2=df2.drop('Company', 1)
merged = df1.merge(df2)
merged=merged.sort_values('Dept')
merged.to_csv(csvDest, index=False)
CodePudding user response:
Merge is the SQL equivalent of join.
The function you need is concat
merged = pd.concat([df1, df2], axis=0, ignore_index=True)
CodePudding user response:
I ended up finding the answer to my own question. I did some digging and what worked for me was using:
merged=df1.append(df2)
merged=merged.sort_values('Dept')
So my final code output:
import pandas as pd
import os, csv, sys
csvPath1 = 'data1.csv'
csvPath2 = 'data2.csv'
csvDest = 'newdata.csv'
df1 = pd.read_csv(csvPath1)
df2 = pd.read_csv(csvPath2)
df1=df1.drop('Company', 1)
df2=df2.drop('Company', 1)
merged=df1.append(df2)
merged=merged.sort_values('Dept')
merged.to_csv(csvDest, index=False)