Home > Net >  Python Pandas Dataframe adding extra column based on duplicates
Python Pandas Dataframe adding extra column based on duplicates

Time:11-23

today I have been working on merging and editing data frames and I have been stuck with a very specific part. I have a column with names of certain fruit and then names of a person like this:

Fruit Person
Banana Jake
Banana Paul
Carrot Nancy
Carrot Sydney
Carrot Jane

Note that the "Person" column will always be unique. My goal is to get something like this with a third or fourth column:

Fruit Person1 Person2 Person3
Banana Jake Paul
Carrot Nancy Sydney Jane

I've tried this:

first = df.drop_duplicates(subset=['Zone','District','Area'],keep='First',inplace = True)
second = df.drop_duplicates(subset=['Zone','District','Area'],keep='Last',inplace = True)

and merging them together later but this will not get those that are in the middle such as "Syndey" from my example. I found this: HERE

But I don't really understand the groupby part. I hope this helps and thank you for your time and patience.

CodePudding user response:

Use df.groupby, Groupby.agg and pd.concat:

In [890]: df1 = df.groupby('Fruit').agg(list).reset_index()
In [905]: res = pd.concat([df1.Fruit, pd.DataFrame(df1.Person.tolist(), index=df1.index)], 1)

In [906]: res
Out[906]: 
    Fruit      0       1     2
0  Banana   Jake    Paul  None
1  Carrot  Nancy  Sydney  Jane
  • Related