How can I merge more than 2 files that e.g. look like these
first csv file:
email,joe,@gmail.com
email,doe,@hotmail.com
name,emilly,doe
name,jenny,van
year,talia,19
year,kevin,20
second csv file:
email,joe,mr
email,doe,mrs
name,jenny,gogh
year,talia,97
I would like to merge these files to look like this:
email,joe,@gmail.com,mr
email,doe,@hotmail.com,mrs
name,emilly,doe,nan
name,jenny,van,gogh
year,talia,19,97
year,kevin,20,nan
any help would be appreciated
CodePudding user response:
Use DataFrame.merge
with left or default inner join:
#convert files to DataFrames, if no header added header=None
df1 = pd.read_csv(file1, header=None)
df2 = pd.read_csv(file2, header=None)
#left join by first 2 columns
df = df1.merge(df2, on=[0,1], how='left')
print (df)
0 1 2_x 2_y
0 email joe @gmail.com mr
1 email doe @hotmail.com mrs
2 name emilly doe NaN
3 name jenny van gogh
4 year talia 19 97
5 year kevin 20 NaN
If need values skipped:
#inner join by first 2 columns
df = df1.merge(df2, on=[0,1])
print (df)
0 1 2_x 2_y
0 email joe @gmail.com mr
1 email doe @hotmail.com mrs
2 name jenny van gogh
3 year talia 19 97
#write to file
df.to_csv(file3, index=False, header=False)
CodePudding user response:
Update:
pd.merge(df1, df2, on=[0, 1], how='outer') \
.to_csv('output.csv', index=False, header=False, na_rep='nan')
# Content of file:
email,joe,@gmail.com,mr
email,doe,@hotmail.com,mrs
name,emilly,doe,nan
name,jenny,van,gogh
year,talia,19,97
year,kevin,20,nan
Old answer (before you fixed your post)
Try:
df1 = pd.read_csv('data1.csv', header=None)
df2 = pd.read_csv('data2.csv', header=None)
out = pd.concat([df1, df2]).groupby(1, as_index=False, sort=False) \
.agg({0: 'first', 1: 'first', 2: list})
out = pd.concat([out, out[2].apply(pd.Series)], axis=1).drop(columns=2)
out.to_csv('output.csv', index=False, header=False, na_rep='nan')
Content of output.csv
file:
email,joe,@gmail.com,mr
email,doe,@hotmail.com,mrs
name,emilly,doe,nan
name,jenny,van,gogh
year,talia,19,97
year,kevin,20,nan