Home > Mobile >  how to get top n values in pandas dataframe if it has repeated values
how to get top n values in pandas dataframe if it has repeated values

Time:01-20

I have a pandas dataframe say:

x y z
1 a x
1 b y
1 c z
2 a x
2 b x
3 a y
4 a z

If i wanted top 2 values by x, I mean top 2 values by x column which gives:

x y z
1 a x
1 b y
1 c z
2 a x
2 b x

If i wanted top 2 values by y, I mean top 2 values by y column which gives:

x y z
1 a x
1 b y
2 a x
2 b x
3 a y
4 a z

How can I achieve this?

CodePudding user response:

You can use:

>>> df[df['x'].isin(df['x'].value_counts().head(2).index)]
   x  y  z
0  1  a  x
1  1  b  y
2  1  c  z
3  2  a  x
4  2  b  x

>>> df[df['y'].isin(df['y'].value_counts().head(2).index)]
   x  y  z
0  1  a  x
1  1  b  y
3  2  a  x
4  2  b  x
5  3  a  y
6  4  a  z

CodePudding user response:

def select_top_k(df, col, top_k):
    grouping_df = df.groupby(col)
    gr_list = list(grouping_df.groups)[:top_k]
    
    temp = grouping_df.filter(lambda x: x[col].iloc[0] in gr_list)
    return temp
data = {'x': [1, 1, 1, 2, 2, 3, 4],
        'y': ['a', 'b', 'c', 'a', 'b', 'a', 'a'],
        'z': ['x', 'y', 'z', 'x', 'x', 'y', 'z']}
df = pd.DataFrame(data)

col = 'x'
top_k = 2

select_top_k(df, col, top_k)
  • Related