Home > Software engineering >  Randomly select 30% of the sum value in Python Pandas
Randomly select 30% of the sum value in Python Pandas

Time:12-30

I'm using Python pandas and have a data frame that is pulled from my CSV file:

ID          Value
123         10
432         14
213         12

'''

214         2
999         43

I want to randomly select some rows with the condition that the sum of the selected values = 30% of the total value.

Please advise how should I write this condition.

CodePudding user response:

You can first shuffle the rows with sample, then filter using loc, cumsum and comparison to be ≤ to 30% of the total:

out = df.sample(frac=1).loc[lambda d: d['Value'].cumsum().le(d['Value'].sum()*0.3)]

Example output:

   ID  Value
0  123     10
3  214      2
2  213     12

Intermediates:

    ID  Value  cumsum   ≤30%
0  123     10      10   True
3  214      2      12   True
2  213     12      24   True
1  432     14      38  False
4  999     43      81  False
  • Related