I have a csv file with 4 columns:
Name Dept Email Name Hair Color
John Smith candy Lincoln Tun brown
Diana Princ candy John Smith gold
Perry Plat wood Oliver Twist bald
Jerry Springer clothes Diana Princ gold
Calvin Klein clothes
Lincoln Tun warehouse
Oliver Twist kitchen
I want to match the columns Name
and Email Name
by names.
This what the final output should look like:
Name Dept Email Name Hair Color
John Smith candy John Smith gold
Diana Princ candy Diana Princ gold
Perry Plat wood
Jerry Springer clothes
Calvin Klein clothes
Lincoln Tun warehouse Lincoln Tun brown
Oliver Twist kitchen Oliver Twist bald
I tried something like this in my code:
dfs = np.split(df,len(df.columns), axis=1)
dfs = [df.set_index(df.columns[0], drop=False) for df in dfs]
f=dfs[0].join(dfs[1:]).reset_index(drop=True).fillna(0)
Which sorted my two columns great but made everything else 0's
Name Dept Email Name Hair Color
John Smith 0 John Smith 0
Diana Princ 0 Diana Princ 0
Perry Plat 0 0 0
Jerry Springer 0 0 0
Calvin Klein 0 0 0
Lincoln Tun 0 Lincoln Tun 0
Oliver Twist 0 Oliver Twist 0
Here is my code so far:
import pandas as pd
import numpy as np
import os, csv, sys
csvPath = 'User.csv'
df= pd.read_csv(csvPath)
dfs = np.split(df,len(df.columns), axis=1)
dfs = [df.set_index(df.columns[0], drop=False) for df in dfs]
f=dfs[0].join(dfs[1:]).reset_index(drop=True).fillna(0)
testCSV = 'test_user.csv' #to check my csv file
f.to_csv(testCSV, encoding='utf-8') #send it to csv
CodePudding user response:
You could use merge for that:
pd.merge(df[['Name','Dept']],df[['Email Name','Hair Color']], left_on='Name', right_on='Email Name', how='left')
Result
Name Dept Email Name Hair Color
0 John Smith candy John Smith gold
1 Diana Princ candy Diana Princ gold
2 Perry Plat wood NaN NaN
3 Jerry Springer clothes NaN NaN
4 Calvin Klein clothes NaN NaN
5 Lincoln Tun warehouse Lincoln Tun brown
6 Oliver Twist kitchen Oliver Twist bald