Python Pandas Dataframe adding extra column based on duplicates-CodePudding

today I have been working on merging and editing data frames and I have been stuck with a very specific part. I have a column with names of certain fruit and then names of a person like this:

Fruit	Person
Banana	Jake
Banana	Paul
Carrot	Nancy
Carrot	Sydney
Carrot	Jane

Note that the "Person" column will always be unique. My goal is to get something like this with a third or fourth column:

Fruit	Person1	Person2	Person3
Banana	Jake	Paul
Carrot	Nancy	Sydney	Jane

I've tried this:

first = df.drop_duplicates(subset=['Zone','District','Area'],keep='First',inplace = True)
second = df.drop_duplicates(subset=['Zone','District','Area'],keep='Last',inplace = True)

and merging them together later but this will not get those that are in the middle such as "Syndey" from my example. I found this: HERE

But I don't really understand the groupby part. I hope this helps and thank you for your time and patience.

CodePudding user response：

Use df.groupby, Groupby.agg and pd.concat:

In [890]: df1 = df.groupby('Fruit').agg(list).reset_index()
In [905]: res = pd.concat([df1.Fruit, pd.DataFrame(df1.Person.tolist(), index=df1.index)], 1)

In [906]: res
Out[906]: 
    Fruit      0       1     2
0  Banana   Jake    Paul  None
1  Carrot  Nancy  Sydney  Jane