Thanks for taking the time to go through this! :) I have a data frame that looks like this
z=pd.DataFrame.from_dict({'id':[1,2,3,4,5,6,7,8,9,10],
'name':['ax','bx','ayx','byz','ck','cl','akl','bwo','cop','bab'],
'value':[45,32,64,12,90,121,34,56,78,76],
'type':['x','y','x','y','z','z','y','z','x','z'],
'year':[1997,1198,1999,1999,1998,1997,1997,1998,1997,1997]})
I can find the values of top 2 types for each year using
z.groupby(['year','type'])['value'].nlargest(2)
How can I find the Top 2 names for each type of each year in the dataset that has highest value?
this is how the output will resemble
year category value name
1997 a 45 ax
a 34 akl
b 76 bab
b NULL NULL
c 121 cl
c 78 cop
value for 1997 -> b is null since there is only one value for b in dataset but we need top 2
CodePudding user response:
I believe you're looking for z.groupby(['year','type'])[['year', 'value']].value_counts()
which returns a hash table of the counts. You can grab the top to records from that.
You may have to call .reset_index()
on your dataframe to unindex your key.
CodePudding user response:
Based on your example DataFrame:
z.sort_values(["year","type","value"], ascending=[1,1,0])\
.groupby(["year","type"]).head(2).reset_index()
index id name value type year
0 1 2 bx 32 y 1198
1 8 9 cop 78 x 1997
2 0 1 ax 45 x 1997
3 6 7 akl 34 y 1997
4 5 6 cl 121 z 1997
5 9 10 bab 76 z 1997
6 4 5 ck 90 z 1998
7 7 8 bwo 56 z 1998
8 2 3 ayx 64 x 1999
9 3 4 byz 12 y 1999
Now for each year and type you have top 2 names. For top-n you may change value inside head() method.