I want to understand the relationships between the variables in columns A and B. They are string variables. For example
df = pd.DataFrame({'First_Name': ['Agatha', 'Agatha','Hercule', 'Hercule'],...
'Last Name': ['Christie', 'Raisin', 'Poirot', 'Holmes']})
I want some kind of data product that shows me:
Agatha: ['Christie', 'Raisin']
Hercule: ['Poirot', 'Holmes']
I would like to be able to do this without a loop.
CodePudding user response:
df.groupby('First_Name',as_index=False)['Last Name'].agg(list)
First_Name Last Name
0 Agatha [Christie, Raisin]
1 Hercule [Poirot, Holmes]
with removing duplicates
df.drop_duplicates().groupby('First_Name',as_index=False)['Last Name'].agg(list)