I'm using Python pandas and have a data frame that is pulled from my CSV file:
ID Value
123 10
432 14
213 12
'''
214 2
999 43
I want to randomly select some rows with the condition that the sum of the selected values = 30% of the total value.
Please advise how should I write this condition.
CodePudding user response:
You can first shuffle the rows with sample
, then filter using loc
, cumsum
and comparison to be ≤ to 30% of the total:
out = df.sample(frac=1).loc[lambda d: d['Value'].cumsum().le(d['Value'].sum()*0.3)]
Example output:
ID Value
0 123 10
3 214 2
2 213 12
Intermediates:
ID Value cumsum ≤30%
0 123 10 10 True
3 214 2 12 True
2 213 12 24 True
1 432 14 38 False
4 999 43 81 False