Home > Net >  Merge more than 2 csv files in python
Merge more than 2 csv files in python

Time:12-06

How can I merge more than 2 files that e.g. look like these

first csv file:

email,joe,@gmail.com
email,doe,@hotmail.com
name,emilly,doe
name,jenny,van
year,talia,19
year,kevin,20

second csv file:

email,joe,mr
email,doe,mrs
name,jenny,gogh
year,talia,97

I would like to merge these files to look like this:

email,joe,@gmail.com,mr
email,doe,@hotmail.com,mrs
name,emilly,doe,nan
name,jenny,van,gogh
year,talia,19,97
year,kevin,20,nan

any help would be appreciated

CodePudding user response:

Use DataFrame.merge with left or default inner join:

#convert files to DataFrames, if no header added header=None
df1 = pd.read_csv(file1, header=None)
df2 = pd.read_csv(file2, header=None)

#left join by first 2 columns
df = df1.merge(df2, on=[0,1], how='left')
print (df)
       0       1           2_x   2_y
0  email     joe    @gmail.com    mr
1  email     doe  @hotmail.com   mrs
2   name  emilly           doe   NaN
3   name   jenny           van  gogh
4   year   talia            19    97
5   year   kevin            20   NaN

If need values skipped:

#inner join by first 2 columns
df = df1.merge(df2, on=[0,1])
print (df)
       0      1           2_x   2_y
0  email    joe    @gmail.com    mr
1  email    doe  @hotmail.com   mrs
2   name  jenny           van  gogh
3   year  talia            19    97

#write to file
df.to_csv(file3, index=False, header=False)

CodePudding user response:

Update:

pd.merge(df1, df2, on=[0, 1], how='outer') \
  .to_csv('output.csv', index=False, header=False, na_rep='nan')

# Content of file:
email,joe,@gmail.com,mr
email,doe,@hotmail.com,mrs
name,emilly,doe,nan
name,jenny,van,gogh
year,talia,19,97
year,kevin,20,nan


Old answer (before you fixed your post)

Try:

df1 = pd.read_csv('data1.csv', header=None)
df2 = pd.read_csv('data2.csv', header=None)

out = pd.concat([df1, df2]).groupby(1, as_index=False, sort=False) \
                           .agg({0: 'first', 1: 'first', 2: list})

out = pd.concat([out, out[2].apply(pd.Series)], axis=1).drop(columns=2)

out.to_csv('output.csv', index=False, header=False, na_rep='nan')

Content of output.csv file:

email,joe,@gmail.com,mr
email,doe,@hotmail.com,mrs
name,emilly,doe,nan
name,jenny,van,gogh
year,talia,19,97
year,kevin,20,nan
  • Related