In the pandas package of python, how would one do the following the most easily?
df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
'Parrot', 'Parrot'],
'Max Speed': [380., 370., 24., 26.]})
Resulting in:
Animal Max Speed
0 Falcon 380.0
1 Falcon 370.0
2 Parrot 24.0
3 Parrot 26.0
Converting this into:
Falcon Parrot
0 380 24
1 370 26
I would have expected to able to use the groupby method, but I cannot seem to figure out how to create a new dataframe out of that.
df.groupby('Animal').to_frame() or something along those lines
edit: okay after some messing about I managed to find a solution in the form of:
df.groupby('Animal')['Max Speed'].apply(pd.DataFrame).apply(lambda x: pd.Series(x.dropna().to_numpy()))
But that seems to be quite clumsy, there is bound to be a better way right?
CodePudding user response:
You can aggregate the values in lists during the groupby, convert to a dictionary, and then back to a DataFrame:
pd.DataFrame(dict(df.groupby('Animal')['Max Speed'].apply(list)))
CodePudding user response:
here is one way to do it
(df.assign(seq=df.groupby('Animal').cumcount()) # add temp sequence to the dup rows
.pivot(index='seq', columns='Animal') # pivot using seq
.droplevel(level=0, axis=1) # drop level 0 in column, resulting from Pivot
.reset_index() # reset index to make it non-multi-index
.drop(columns='seq') # drop sequence column
.rename_axis(columns=None) # rename column axis
)
Falcon Parrot
0 380.0 24.0
1 370.0 26.0
OR
# using stack instead of Pivot, by setting up index on seq and animal
# then unstacking
# reset of the solution is similar to previous one
(df.assign(seq=df.groupby('Animal').cumcount())
.set_index(['seq','Animal'])
.unstack()
.droplevel(level=0, axis=1)
.reset_index()
.drop(columns='seq')
.rename_axis(columns=None)
)
CodePudding user response:
pd.pivot(df,values= ["Max Speed"], columns= ["Animal"]).apply(lambda x: pd.Series(x.dropna().to_numpy()))
This is another way to it Thank you.