#Loop through example dataset
for id in df['val_id']:
#total assigned value
df['tot_val'] = 0
#loop through facilities
for fac in df['fac_id']:
if df['product'].isin(['XL', 'CL', 'DL']).all():
df['row'] = min(df['our_val_amt'] - df['tot_val'], df['val_against'])
else:
df['row'] = 0
df['tot_val'] = df['tot_val'] df['row']
df['row'] = df['row'] 1
Sample to test val_against > val_amt change
# df = pd.DataFrame(data=[["compx","xx1","yy1",424,418,"XL"],["compx","xx1","yy2",424,134,"CL"],["compx","xx2","yy3",472,60,"DL"],["compx","xx2","yy4",472,104,"CL"], ["compx", "xx3", "yy5", 490, 50, "XL"], ["compx", "xx3", "yy6", 490, 500, "CL"], ["compx", "xx3", "yy7", 490, 200, "DL"], ["compx", "xx4", "yy8", 510, 200, "CL"], ["compx", "xx4", "yy9", 510, 300, "CL"], ["compx", "xx4", "yy10", 510, 50, "XL"]], columns=["name","val_id","fac_id","our_val_amt","val_against","product"])
I am trying to in Pandas double loop through the 'val_id' and 'fac_id' column and create a new field from the following conditions.
1.Within each 'val_id' loop if 'product' == 'CL' then min of 'val_against' and 'our_val_amt' e.g. min( val_against (134), our_val_amt (424)) therefore 'NEW FIELD' = 134. Also if the sum of NEW FIELD exceeds "our_val_amt", then subtract it from "our_val_amt". e.g. for val_id "xx4", (200 300 50) = 550 which exceeds our_val_amt=510, so NEW FILED = 550 - 510 = 10.
2.If product != 'CL' and is in the same 'val_id' group. The remainder to be subtracted from 'our_val_amt' to be inserted in 'NEW FIELD'. e.g 'our_val_amt' (424) - from step 1 (134) = 290. This inserted above 'NEW FIELD'.
3.Repeat steps for val_id xx2. NEW FIELD calculation for CL = 104 and XL = 472 - 104 = 368.
Dataset with computed fields labeled with **
| name | val_id | fac_id | our_val_amt | val_against | product | **row** | **totval**|**Field Want**
| compx | xx1 | yy1 | 424 | 418 | XL | 290 | 0 | 290
| compx | xx1 | yy2 | 424 | 134 | CL | 134 | 134 | 134
| compx | xx2 | yy3 | 472 | 60 | DL | 0 | 0 | 368
| compx | xx2 | yy4 | 472 | 104 | CL | 104 | 104 | 104
| compx | xx3 | yy5 | 490 | 50 | XL |
| compx | xx3 | yy6 | 490 | 500 | CL |
| compx | xx3 | yy7 | 490 | 200 | DL |
| compx | xx4 | yy8 | 510 | 200 | CL |
| compx | xx4 | yy9 | 510 | 300 | CL |
| compx | xx4 | yy10 | 510 | 50 | CL |
Expected Output
| name | val_id | fac_id | our_val_amt | val_against | product | new field |
| compx | xx1 | yy1 | 424 | 418 | XL | 290 |
| compx | xx1 | yy2 | 424 | 134 | CL | 134 |
| compx | xx2 | yy3 | 472 | 60 | DL | 368 |
| compx | xx2 | yy4 | 472 | 104 | CL | 104 |
| compx | xx3 | yy5 | 490 | 50 | XL | 0 |
| compx | xx3 | yy6 | 490 | 500 | CL | 490 |
| compx | xx3 | yy7 | 490 | 200 | DL | 0 |
| compx | xx4 | yy8 | 510 | 200 | CL | 200 |
| compx | xx4 | yy9 | 510 | 300 | CL | 300 |
| compx | xx4 | yy10 | 510 | 50 | CL | 10 |
The **row** and ** totval** are fields i created trying to create the field that i want with the logic above.
I've tried grouping the variables together and having the 'product' as columns but unfortunately i receive 6000 columns and have no idea on how to compute the calculation for this scenario.
I've also attempted to loop through but received incorrect outputs / errors
error: Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
CodePudding user response:
You can solve this in 2 steps:
- In Step 1, compute the new field values for "CL". Following solution assumes
0
value fornew_field
if computing condition is not met. - In Step 2, process non-CL records and use values computed in step 1. To handle case where multiple "CL" records may be present in same val_id group, the following code sums them up.
The comments are added before each step to explain what it does.
df = pd.DataFrame(data=[["compx","xx1","yy1",424,418,"XL"],["compx","xx1","yy2",424,134,"CL"],["compx","xx2","yy3",472,60,"DL"],["compx","xx2","yy4",472,104,"CL"],
["compx","xx3","yy5",490,50,"XL"],["compx","xx3","yy6",490,500,"CL"],["compx","xx3","yy7",490,200,"DL"],
["compx","xx4","yy8",510,200,"CL"],["compx","xx4","yy9",510,300,"CL"],["compx","xx4","yy10",510,50,"CL"],
], columns=["name","val_id","fac_id","our_val_amt","val_against","product"])
# Compute tuple of "our_val_amt", "val_against" and "product" for easy processing as one column. It is hard to process multiple columns with "transform()".
df["the_tuple"] = df[["our_val_amt", "val_against", "product"]].apply(tuple, axis=1)
def compute_new_field_for_cl(g):
# df_g is a tuple ("our_val_amt", "val_against", "product") indexed as (0, 1, 2).
df_g = g.apply(pd.Series)
df_g["new_field"] = df_g.apply(lambda row: min(row[0], row[1]) if row[2] == "CL" else 0, axis=1)
df_g["cumsum"] = df_g["new_field"].cumsum()
df_g["new_field"] = df_g.apply(lambda row: 0 if row["cumsum"] > row[0] else row["new_field"], axis=1)
df_g["max_cumsum"] = df_g["new_field"].cumsum()
df_g["new_field"] = df_g.apply(lambda row: row[0] - row["max_cumsum"] if row["cumsum"] > row[0] else row["new_field"], axis=1)
return df_g["new_field"]
# Apply above function and compute new field values for "CL".
df["new_field"] = df.groupby("val_id")[["the_tuple"]].transform(compute_new_field_for_cl)
# Re-compute tuple of "our_val_amt", "new_field" and "product".
df["the_tuple"] = df[["our_val_amt", "new_field", "product"]].apply(tuple, axis=1)
def compute_new_field_for_not_cl(g):
# df_g is a tuple ("our_val_amt", "new_field", "product") indexed as (0, 1, 2).
df_g = g.apply(pd.Series)
result_sr = df_g.where(df_g[2] != "CL")[0] - df_g[df_g[2] == "CL"][1].sum()
result_sr = result_sr.fillna(0) df_g[1]
return result_sr
# Apply above function and compute new field values for "CL".
df["new_field"] = df.groupby("val_id")[["the_tuple"]].transform(compute_new_field_for_not_cl)
df = df.drop("the_tuple", axis=1)
print(df)
Output:
name val_id fac_id our_val_amt val_against product new_field
0 compx xx1 yy1 424 418 XL 290.0
1 compx xx1 yy2 424 134 CL 134.0
2 compx xx2 yy3 472 60 DL 368.0
3 compx xx2 yy4 472 104 CL 104.0
4 compx xx3 yy5 490 50 XL 0.0
5 compx xx3 yy6 490 500 CL 490.0
6 compx xx3 yy7 490 200 DL 0.0
7 compx xx4 yy8 510 200 CL 200.0
8 compx xx4 yy9 510 300 CL 300.0
9 compx xx4 yy10 510 50 CL 10.0