I have a dataframe with residuals and a dataframe with orders
ln [65]: residuals = pd.DataFrame({'area': ['pnos', 'vnp'], 'number': [3, 4], 'balances':[20,95]})
In [67]: orders = pd.DataFrame({'area': ['pnos', 'vnp','vnp','vnp', 'vnp'], 'number': [3, 4, 4, 4 1], 'requires':[40,70,20,10,25]})
In [68]:orders
Out[68]:
area number requires
0 pnos 3 40
1 vnp 4 70
2 vnp 4 20
3 vnp 4 10
4 vnp 1 25
In [69]: residuals
Out[69]:
area number balances
0 pnos 3 20
1 vnp 4 95
You can see area and number product that is a keys in orders by area - 'pnos' and number '3' requires 40 kg and we can see in dataframe 'residuals' and if it have area - 'pnos' and number - '3' and its balances more or equal than requires, so we set 20 and reduce in balances by this amount, else we just set 0
Visually, the result I want would look like this:
area number requires ready
0 pnos 3 40 20
1 vnp 4 70 70
2 vnp 4 20 20
3 vnp 4 10 5
4 vnp 1 25 0
In the third line (area-pnos,number-4) we set 5 because we have distributed the balance to the previous lines and now the balance is less than required
I solved it, but not in a real elegant way.
def distr(number, area, requiers):
if residuals[(residuals['number']==number) & (residuals['area']==area)].empty:
return 0
elif requiers==0:
return 0
elif requiers>0:
if residuals[(residuals['number']==number) & (residuals['area']==area)]['balances'].iloc[0] >= requiers:
residuals.loc[(residuals['number']==number) & (residuals['area']==area), 'balances'] -= requiers
return requiers
else:
result = residuals[(residuals['number']==number) &
(residuals['area']==area)]['balances'].iloc[0]
residuals.loc[(residuals['number']==number) & (residuals['area']==area), 'balances'] = 0
return result
In [118]: orders['ready'] = orders.apply(lambda x: distr(x['number'], x['area'], x['requires']), axis=1)
how I can distribute data more optimize? I have more than 20000 rows in excel and it calculate for quite a long time
CodePudding user response:
- First, you need to join the
residuals
to give the upper limit perarea
-number
pairs. - Second, calculated the
requires_cum
as the cumulative sum of therequires
, which is the desired amount from the orders - Third, calculate the
ready
as therequires
adjusted with the appropriate upper and lower bounds.
import pandas as pd
residuals = pd.DataFrame({'area': ['pnos', 'vnp'], 'number': [3, 4], 'balances':[20,95]})
orders = pd.DataFrame({'area': ['pnos', 'vnp','vnp','vnp', 'vnp'], 'number': [3, 4, 4, 4, 1], 'requires':[40,70,20,10,25]})
orders = pd.merge(orders, residuals, how="left")
orders.balances.fillna(0, inplace=True)
orders["requires_cum"] = orders.groupby(["area", "number"]).requires.cumsum()
orders["ready"] = orders.requires.clip(upper=orders.balances - orders.requires_cum orders.requires).clip(lower=0)
orders
yields:
area number requires balances requires_cum ready
0 pnos 3 40 20.0 40 20
1 vnp 4 70 95.0 70 70
2 vnp 4 20 95.0 90 20
3 vnp 4 10 95.0 100 5
4 vnp 1 25 0.0 25 0