I need to create a new column that establishes relationships between entities in a delimited list ( any delimiter works except a comma ).
Dataframe:
df1 = pd.DataFrame(np.array([[1000, 'Jerry', 'BR1','BR1'],
[1001, 'Sal', 'BR2', 'BR1'],
[1002, 'Buck', 'BR3', 'BR2'],
[1003, 'Perry','BR4','BR1']]),
columns=['ID', 'Name', 'Branch', 'Member of'])
The final result should be:
ID Name Branch Member of Members
==== ==== ====== ========= =======
1000 Jerry BR1 BR1 Jerry, Sal, Perry
1001 Sal BR2 BR1 Buck
1002 Buck BR3 BR2 NaN
1003 Perry BR4 BR1 NaN
I need to create the "members" column by finding all matches in "Member of", but returning the "Name", then populate a list that ends up in "members".
Would np.where be a good place to start?
np.where(df['Branch'] == df['Member of'], ??, np.nan)
CodePudding user response:
Use groupby
to generate the list of members, then merge
:
s = df1.groupby('Member of')['Name'].apply(list).rename('Members')
df2 = df1.merge(s, left_on='Branch', right_index=True, how='left')
Output:
ID Name Branch Member of Members
0 1000 Jerry BR1 BR1 [Jerry, Sal, Perry]
1 1001 Sal BR2 BR1 [Buck]
2 1002 Buck BR3 BR2 NaN
3 1003 Perry BR4 BR1 NaN
NB. If you want a string and not a list, use ', '.join
in place of list
CodePudding user response:
Try this:
df1['Members'] = df1['Branch'].apply(lambda b: ', '.join(df1.loc[df1['Member of'] == b, 'Name'])).replace('', np.nan)
Output:
>>> df1
ID Name Branch Member of Members
0 1000 Jerry BR1 BR1 Jerry, Sal, Perry
1 1001 Sal BR2 BR1 Buck
2 1002 Buck BR3 BR2 NaN
3 1003 Perry BR4 BR1 NaN