How to combine two csv files together-CodePudding

I already looked at: How to combine 2 csv files with common column value, but both files have different number of lines and: Merging two CSV files using Python But both did not give the desired output I needed.

I have two csv files with the below data:

First file is - data1.csv

Name             Dept        Company  
John Smith       candy       lead
Diana Princ      candy       lead
Perry Plat       wood        lead
Jerry Springer   clothes     lead
Calvin Klein     clothes     lead   
Lincoln Tun      warehouse   lead   
Oliver Twist     kitchen     lead

Second file is - data2.csv

Name             Dept        Company  
John Smith       candy       lead
Tyler Perry      candy       lead
Perry Plat       wood        lead
Mary Poppins     clothes     lead
Calvin Klein     clothes     lead   
Lincoln Tun      warehouse   lead   
Herman Sherman   kitchen     lead
Jerry Springer   clothes     lead
Ivan Evans       clothes     lead

I want to merge them as one file, called newdata.csv, sorting the Dept column into groups and dropping the Company column. The final output would look something like this:

Name             Dept        
John Smith       candy       
Diana Princ      candy       
Tyler Perry      candy       
Perry Plat       wood       
Jerry Springer   clothes     
Calvin Klein     clothes     
Mary Poppins     clothes     
Ivan Evans       clothes     
Lincoln Tun      warehouse   
Oliver Twist     kitchen     
Herman Sherman   kitchen

I tried to use the merge function, but the output wasn't what I needed.

This is my code so far:

import pandas as pd
import os, csv, sys

csvPath1 = 'data1.csv'
csvPath2 = 'data2.csv'
csvDest = 'newdata.csv'

df1 = pd.read_csv(csvPath1)
df2 = pd.read_csv(csvPath2)

df1=df1.drop('Company', 1)
df2=df2.drop('Company', 1)

merged = df1.merge(df2)
merged=merged.sort_values('Dept')

merged.to_csv(csvDest, index=False)

CodePudding user response：

Merge is the SQL equivalent of join.

The function you need is concat

merged = pd.concat([df1, df2], axis=0, ignore_index=True)

CodePudding user response：

I ended up finding the answer to my own question. I did some digging and what worked for me was using:

merged=df1.append(df2)
merged=merged.sort_values('Dept')

So my final code output:

import pandas as pd
import os, csv, sys

csvPath1 = 'data1.csv'
csvPath2 = 'data2.csv'
csvDest = 'newdata.csv'

df1 = pd.read_csv(csvPath1)
df2 = pd.read_csv(csvPath2)

df1=df1.drop('Company', 1)
df2=df2.drop('Company', 1)

merged=df1.append(df2)
merged=merged.sort_values('Dept')

merged.to_csv(csvDest, index=False)