I have a dataframe like this:
d = {'col1': ['Category1', 'Category1','Category1', 'Category1','Category1', 'Category2','Category2', 'Category2','Category2', 'Category2',],
'col2': ['Type1', 'Type2','Type3','Type4','Type5','Type1', 'Type2','Type3','Type4','Type5'],
'col3':[32,44,87,10,12,10,14,800,3200,35]}
df_test = pd.DataFrame(data=d)
df_test
col1 col2 col3
Category1 Type1 32
Category1 Type2 44
Category1 Type3 87
Category1 Type4 10
Category1 Type5 12
Category2 Type1 10
Category2 Type2 14
Category2 Type3 800
Category2 Type4 3200
Category2 Type5 35
What I'd like to get is the two rows Category 1 and Category 2 containing the sorted 3 largest Type based on col3 numbers. So the output would look like this:
col1 order1 order2 order3
Category1 Type3 Type2 Type1
Category2 Type4 Type3 Type5
I've tried:
df_test=df_test.groupby(['col1']).apply(lambda x: (x.groupby('col2')
.sum()
.sort_values('col3', ascending=False))
.head(3)).T
And other different ways with no success. Any ideas? Thank you
CodePudding user response:
df_t = df.pivot(columns='col2', index='col1')
df_t = df_t.apply(lambda s: [x[1] for x in s.nlargest(3).index], axis=1)
print(df_t)
Output:
col1
Category1 [Type3, Type2, Type1]
Category2 [Type4, Type3, Type5]
dtype: object
CodePudding user response:
df_test['col3'] = df_test.groupby('col1')['col3'].rank(ascending = False)
df1 = df_test.query('col3 <= 3').reset_index(drop = True)
df1['col4'] = 'order' df1['col3'].astype(int).astype(str)
df1.pivot('col1', 'col4', 'col2').reset_index()
col3 col1 order1 order2 order3
0 Category1 Type3 Type2 Type1
1 Category2 Type4 Type3 Type5