I'm using Python pandas and have a data frame that is pulled from my CSV file:
ID Value
123 10
432 14
213 12
'''
214 2
999 43
I was advised using the following code can randomly select some rows with the condition that the sum of the selected values = 30% of the total value. ("close to 30%" works)
out = df.sample(frac=1).loc[lambda d: d['Value'].cumsum().le(d['Value'].sum()*0.3)]
Now I want to sort the rows based on the value and select the top rolls until they add up to 30% of the total value.
Please advise.
CodePudding user response:
You can use pandas.DataFrame.sort_values
and then find the mask that are True
and select rows from df_sorted
with pandas.DataFrame.loc
.
df_sorted = df.sort_values('Value').reset_index(drop=True)
m = df_sorted['Value'].cumsum().le(df['Value'].sum()*0.3)
print(df_sorted.loc[m])
Output:
ID Value
0 214 2
1 123 10
2 213 12
CodePudding user response:
A possible solution:
df.iloc[
df['Value']
.sort_values()
.cumsum()[lambda x: x <= 0.3 * df['Value'].sum()]
.index]
Output:
ID Value
3 214 2
0 123 10
2 213 12