So I want to spread the shipments per ID in the group one by one by looking at avg sales to determine who to give it to.
Here's my dataframe:
ID STOREID BAL SALES SHIP
1 STR1 50 5 18
1 STR2 6 7 18
1 STR3 74 4 18
2 STR1 35 3 500
2 STR2 5 4 500
2 STR3 54 7 500
While SHIP (grouped by ID) is greater than 0, calculate AVG (BAL/SALES) and the lowest AVG per group give 1 to its column BAL and 1 to its column final. And then repeat the process until SHIP is 0. The AVG would be different every stage which is why I wanted it to be a while loop.
Sample output of first round is below. So do this until SHIP is 0 and SUM of Final per ID is = to SHIP:
ID STOREID BAL SALES SHIP AVG Final
1 STR1 50 5 18 10 0
1 STR2 6 4 18 1.5 1
1 STR3 8 4 18 2 0
2 STR1 35 3 500 11.67 0
2 STR2 5 4 500 1.25 1
2 STR3 54 7 500 7.71 0
I've tried a couple of ways in SQL, I thought it would be better to do it in python but I haven't been doing a great job with my loop. Here's what I tried so far:
df['AVG'] = 0
df['FINAL'] = 0
for i in df.groupby(["ID"])['SHIP']:
if i > 0:
df['AVG'] = df['BAL'] / df['SALES']
df['SHIP'] = df.groupby(["ID"])['SHIP']-1
total = df.groupby(["ID"])["FINAL"].transform("cumsum")
df['FINAL'] = 1
df['A'] = 1
else:
df['FINAL'] = 0
CodePudding user response:
This was challenging because more than one row in the group can have the same average calculation. then it throws off the allocation.
This works on the example dataframe, if I understood you correctly.
d = {'ID': [1, 1, 1, 2,2,2], 'STOREID': ['str1', 'str2', 'str3','str1', 'str2', 'str3'],'BAL':[50, 6, 74, 35,5,54], 'SALES': [5, 7, 4, 3,4,7], 'SHIP': [18, 18, 18, 500,500,500]}
df = pd.DataFrame(data=d)
df['AVG'] = 0
df['FINAL'] = 0
def calc_something(x):
# print(x.iloc[0]['SHIP'])
for i in range(x.iloc[0]['SHIP'])[0:500]:
x['AVG'] = x['BAL'] / x['SALES']
x['SHIP'] = x['SHIP']-1
x = x.sort_values('AVG').reset_index(drop=True)
# print(x.iloc[0, 2])
x.iloc[0, 2] = x['BAL'][0] 1
x.iloc[0, 6] = x['FINAL'][0] 1
return x
df_final = df.groupby('ID').apply(calc_something).reset_index(drop=True).sort_values(['ID', 'STOREID'])
df_final
ID STOREID BAL SALES SHIP AVG FINAL
1 1 STR1 50 5 0 10.000 0
0 1 STR2 24 7 0 3.286 18
2 1 STR3 74 4 0 18.500 0
4 2 STR1 127 3 0 42.333 92
5 2 STR2 170 4 0 42.500 165
3 2 STR3 297 7 0 42.286 243