Suppose I have a column like so (based on some sort applied on B column):
A B
0 2 5
1 5 4
2 9 3
3 4 3
4 3 1
Now I have to find out the sum up to 15 but the order of the rows in dataframe cannot change (it needs to be sorted by column B values)(can omit a row though while calculating sum), so in this case row sum(0, 1 , 3 , 4)over column A = 14. Basically conditional summation over a column.
Expected Output: df whose column A additions of rows <= N (15 here).
CodePudding user response:
IIUC:
df = df[df['A'].sort_values().cumsum() < 15]
OUTPUT
A B
0 2 5
1 5 4
3 4 3
4 3 1
CodePudding user response:
Since you can skip over rows, the order of which columns are selected is unimportant (we can recover it afterward). The maximum subset will be, as @MuhammadHassan said, the part of A
, sorted, that sums up to 15:
s = df['A'].sort_values().cumsum() <= 15
idx = df.index.intersection(s[s].index)
>>> idx.tolist()
[0, 1, 3, 4]
# and
>>> df.loc[idx]
A B
0 2 5
1 5 4
3 4 3
4 3 1
Edit
I'll leave this answer above for didactic purposes, but @MuhammadHassan's answer is correct and more concise. To prevent the UserWarning: Boolean Series key will be reindexed to match DataFrame index
(and to select up to 15, which means up to and including 15):
>>> df.loc[df['A'].sort_values().cumsum() <= 15]
A B
0 2 5
1 5 4
3 4 3
4 3 1