I have this kind of data frame
df = pd.DataFrame(data={'Name': ['a1', 'a2','a1'], 'Quantity': [8,4,5], Order: [1,1,2], 'Result': [0,0,0]})
Name | Quantity | Order | Result |
---|---|---|---|
a1 | 8 | 1 | 0 |
a2 | 4 | 1 | 0 |
a1 | 5 | 2 | 0 |
I would like to distribute a number, N=10, over the rows of the data frame according:
- Quantity -> each name should receive, when possible, at most what is asking in Quantity
- by satisfying the Order --> that is the order inside each group of Name
The desired result should be:
Name | Quantity | Order | Result |
---|---|---|---|
a1 | 8 | 1 | 8 |
a2 | 4 | 1 | 2 |
a1 | 5 | 2 | 0 |
Therefore, first I try to satisfy the Name = a1 Order = 1 by giving 8, then Name = a2 Order = 1 by giving only 2 since is what is remaining from the initial N = 10. The Name = a1 Order = 2 will not get anything.
The only way I found to solve that problem is:
- An initial groupby on Name, to get the key names of all the groups (only two, a1 and a2, in the example)
- Two nested for loop, the inner on the Name, the outer on the Order, until N = 0
Is there a way to avoid the for loops by using some pandas vectorised functions?
CodePudding user response:
You can use:
N = 10
# ensure rows are sorted by priority and cumsum
s = df.sort_values(by=['Order', 'Name'])['Quantity'].cumsum()
# set the maximum sum to 10 and reverse the cumsum with diff
df['Result'] = s.clip(upper=N).diff().fillna(s.clip(upper=N), downcast='infer')
output:
Name Quantity Order Result
0 a1 8 1 8
1 a2 4 1 2
2 a1 5 2 0
CodePudding user response:
A possible solution:
N = 10
df = df.sort_values(by=['Order', 'Name'])
df['Result'] = (N - df['Quantity'].cumsum())
cond = df['Result'].le(0).cumsum()
df['Result'] = np.where(
cond.eq(1), df['Quantity'].add(df['Result']),
np.where(cond.eq(0), df['Quantity'], 0))
Output:
Name Quantity Order Result
0 a1 8 1 8
1 a2 4 1 2
2 a1 5 2 0