today I have been working on merging and editing data frames and I have been stuck with a very specific part. I have a column with names of certain fruit and then names of a person like this:
Fruit | Person |
---|---|
Banana | Jake |
Banana | Paul |
Carrot | Nancy |
Carrot | Sydney |
Carrot | Jane |
Note that the "Person" column will always be unique. My goal is to get something like this with a third or fourth column:
Fruit | Person1 | Person2 | Person3 |
---|---|---|---|
Banana | Jake | Paul | |
Carrot | Nancy | Sydney | Jane |
I've tried this:
first = df.drop_duplicates(subset=['Zone','District','Area'],keep='First',inplace = True)
second = df.drop_duplicates(subset=['Zone','District','Area'],keep='Last',inplace = True)
and merging them together later but this will not get those that are in the middle such as "Syndey" from my example. I found this: HERE
But I don't really understand the groupby
part. I hope this helps and thank you for your time and patience.
CodePudding user response:
Use df.groupby
, Groupby.agg
and pd.concat
:
In [890]: df1 = df.groupby('Fruit').agg(list).reset_index()
In [905]: res = pd.concat([df1.Fruit, pd.DataFrame(df1.Person.tolist(), index=df1.index)], 1)
In [906]: res
Out[906]:
Fruit 0 1 2
0 Banana Jake Paul None
1 Carrot Nancy Sydney Jane