Home > Software design >  Spread percentage summary in dataframe pandas
Spread percentage summary in dataframe pandas

Time:11-10

If for example I have one column data frame pandas.

A 20
B 20
C 15
D 10
E 10
F  8 
G  7
H  5
I  5

And I want to get data spread such as then the biggest 75%, 15% and last 10% is

A        F        H     
B        G        I
C        
D
E

Is there pandas function that can make this summary faster ? Do I need to make index as column name ? because I got the value from df.value_counts() from df dataframe.

CodePudding user response:

The exact input and expected output is not fully clear, but assuming this DataFrame as input:

   col
A   20
B   20
C   15
D   10
E   10
F    8
G    7
H    5
I    5

You can get a dictionary of the indices using:

import numpy as np

target = [75, 15, 10]

group = pd.cut(df['col'].cumsum(), bins=np.r_[0, np.cumsum(target)], labels=target)

df.index.groupby(group)

output: {75: ['A', 'B', 'C', 'D', 'E'], 15: ['F', 'G'], 10: ['H', 'I']}

  • Related