Home > Net >  Sum value of a column in a pandas dataframe sorted by another column upto a value N
Sum value of a column in a pandas dataframe sorted by another column upto a value N

Time:12-27

Suppose I have a column like so (based on some sort applied on B column):

   A   B
0  2   5   
1  5   4
2  9   3
3  4   3
4  3   1

Now I have to find out the sum up to 15 but the order of the rows in dataframe cannot change (it needs to be sorted by column B values)(can omit a row though while calculating sum), so in this case row sum(0, 1 , 3 , 4)over column A = 14. Basically conditional summation over a column.

Expected Output: df whose column A additions of rows <= N (15 here).

CodePudding user response:

IIUC:

df = df[df['A'].sort_values().cumsum() < 15]

OUTPUT

   A  B
0  2  5
1  5  4
3  4  3
4  3  1

CodePudding user response:

Since you can skip over rows, the order of which columns are selected is unimportant (we can recover it afterward). The maximum subset will be, as @MuhammadHassan said, the part of A, sorted, that sums up to 15:

s = df['A'].sort_values().cumsum() <= 15
idx = df.index.intersection(s[s].index)

>>> idx.tolist()
[0, 1, 3, 4]

# and
>>> df.loc[idx]
   A  B
0  2  5
1  5  4
3  4  3
4  3  1

Edit

I'll leave this answer above for didactic purposes, but @MuhammadHassan's answer is correct and more concise. To prevent the UserWarning: Boolean Series key will be reindexed to match DataFrame index (and to select up to 15, which means up to and including 15):

>>> df.loc[df['A'].sort_values().cumsum() <= 15]
   A  B
0  2  5
1  5  4
3  4  3
4  3  1
  • Related