Pandas: distribute a number throughout a column until is over-CodePudding

I have this kind of data frame

df = pd.DataFrame(data={'Name': ['a1', 'a2','a1'], 'Quantity': [8,4,5], Order: [1,1,2], 'Result': [0,0,0]})

Name	Quantity	Order
a1	8	1
a2	4	1
a1	5	2

I would like to distribute a number, N=10, over the rows of the data frame according:

Quantity -> each name should receive, when possible, at most what is asking in Quantity
by satisfying the Order --> that is the order inside each group of Name

The desired result should be:

Name	Quantity	Order	Result
a1	8	1	8
a2	4	1	2
a1	5	2	0

Therefore, first I try to satisfy the Name = a1 Order = 1 by giving 8, then Name = a2 Order = 1 by giving only 2 since is what is remaining from the initial N = 10. The Name = a1 Order = 2 will not get anything.

The only way I found to solve that problem is:

An initial groupby on Name, to get the key names of all the groups (only two, a1 and a2, in the example)
Two nested for loop, the inner on the Name, the outer on the Order, until N = 0

Is there a way to avoid the for loops by using some pandas vectorised functions?

CodePudding user response：

You can use:

N = 10

# ensure rows are sorted by priority and cumsum
s = df.sort_values(by=['Order', 'Name'])['Quantity'].cumsum()
# set the maximum sum to 10 and reverse the cumsum with diff
df['Result'] = s.clip(upper=N).diff().fillna(s.clip(upper=N), downcast='infer')

output:

  Name  Quantity  Order  Result
0   a1         8      1       8
1   a2         4      1       2
2   a1         5      2       0

CodePudding user response：

A possible solution:

N = 10

df = df.sort_values(by=['Order', 'Name'])
df['Result'] = (N - df['Quantity'].cumsum())
cond = df['Result'].le(0).cumsum()
df['Result'] = np.where(
    cond.eq(1), df['Quantity'].add(df['Result']),
    np.where(cond.eq(0), df['Quantity'], 0))

Output:

  Name  Quantity  Order  Result
0   a1         8      1       8
1   a2         4      1       2
2   a1         5      2       0