Home > Enterprise >  Python pandas create new dataframe out of column entries
Python pandas create new dataframe out of column entries

Time:11-13

In the pandas package of python, how would one do the following the most easily?

df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
                              'Parrot', 'Parrot'],
                   'Max Speed': [380., 370., 24., 26.]})

Resulting in:

    Animal  Max Speed
0   Falcon  380.0
1   Falcon  370.0
2   Parrot  24.0
3   Parrot  26.0

Converting this into:

    Falcon  Parrot
0   380     24
1   370     26

I would have expected to able to use the groupby method, but I cannot seem to figure out how to create a new dataframe out of that.

df.groupby('Animal').to_frame() or something along those lines

edit: okay after some messing about I managed to find a solution in the form of:

df.groupby('Animal')['Max Speed'].apply(pd.DataFrame).apply(lambda x: pd.Series(x.dropna().to_numpy()))

But that seems to be quite clumsy, there is bound to be a better way right?

CodePudding user response:

You can aggregate the values in lists during the groupby, convert to a dictionary, and then back to a DataFrame:

pd.DataFrame(dict(df.groupby('Animal')['Max Speed'].apply(list)))

CodePudding user response:

here is one way to do it

(df.assign(seq=df.groupby('Animal').cumcount()) # add temp sequence to the dup rows
 .pivot(index='seq', columns='Animal')          # pivot using seq
 .droplevel(level=0, axis=1)                    # drop level 0 in column, resulting from Pivot
 .reset_index()                                 # reset index to make it non-multi-index
 .drop(columns='seq')                           # drop sequence column
 .rename_axis(columns=None)                     # rename column axis
)
    Falcon  Parrot
0   380.0   24.0
1   370.0   26.0

OR

# using stack instead of Pivot, by setting up index on seq and animal
# then unstacking
# reset of the solution is similar to previous one

(df.assign(seq=df.groupby('Animal').cumcount())
 .set_index(['seq','Animal'])
 .unstack()
 .droplevel(level=0, axis=1)
 .reset_index()
 .drop(columns='seq')
 .rename_axis(columns=None)
)

CodePudding user response:

pd.pivot(df,values= ["Max Speed"], columns= ["Animal"]).apply(lambda x: pd.Series(x.dropna().to_numpy()))

This is another way to it Thank you.

  • Related