I have a DataFrame like this
df.head()
>>>
Date Region Manager SalesMan Item Units Unit_price Sale_amt
0 East Martha Alexander Television ... ... ...
1 Central Hermann Shelli Home Theater ... ... ...
2 Central Hermann Luis Television ... ... ...
3 Central Timothy David CellPhone ... ... ...
4 West Timothy Stephen Television ... ... ...
Here are the unique Managers and SalesMen
df['Manager'].unique()
array(['Martha', 'Hermann', 'Timothy', 'Douglas'], dtype=object)
df['SalesMan'].unique()
array(['Alexander', 'Shelli', 'Luis', 'David', 'Stephen', 'Steven',
'Michael', 'Sigal', 'Diana', 'Karen', 'John'], dtype=object)
I want a dataframe that contains Unique Managers and the list of unique Salesmen under those managers For example, for the above dataframe, I want an output like:
Manager list_of_salesmen
Martha [ALexander]
Herman [Shelli, Luis]
Timothy [David, Stephen]
I thought of using groupby and got struck in there! How do I go about solving this problem?
CodePudding user response:
You could use a groupby.agg
on Manager, and pass list
to SalesMan:
>>> df.groupby('Manager').agg({'SalesMan':list})
SalesMan
Manager
Hermann [Shelli, Luis]
Martha [Alexander]
Timothy [David, Stephen]
CodePudding user response:
It can be done by making a dict()
object that contains data for new dataset and use pandas.DataFrame.from_dict()
to convert it to dataframe:
d = {'Manager':list(df['Manager'].unique()), 'SalesMan':[]}
for i in df['Manager'].unique():
d['SalesMan'].append([i for i in df[df['Manager'] == i]['SalesMan']])
df2 = pd.DataFrame.from_dict(d)