I have a dataframe with values of different attributes, and i have a json file which have a list of attributes to sum only if a column in the dataframe contains a string.
| Product | store_location | attr1 | attr2 | attr3 | attr4 | global |
| ------- | -------------- | ----- | ----- | ----- | ----- | ------ |
| First | NY-store1 | 3 | 5 | 2 | 2 | |
| Second | NY-store2 | 1 | 3 | 5 | 1 | |
| Third | NJ-store1 | 3 | 5 | 2 | 2 | |
| Fourth | PA-store1 | 1 | 3 | 5 | 1 | |
The json file has this structure:
{
"positionEvaluation": [
{
"position": "Global",
"sumElements": ["attr1", "attr2"],
"gralSum": ["attr2", "attr3", "attr4"],
"elementsProm": ["attr1", "attr2", "attr3", "attr4"]
}
]
}
Obviously the real file has more attributes, only for demo. So, I want when the product has in the store location the string 'NY' take respective attributes of "sumElements" and divide by the length of "gralSum", and if the product has another string like 'NJ' or 'PA' just sum all elements of "elementsProm" and then divide by the length of it.
Here my code:
for p in range(len(js_positions["positionEvaluation"])):
aux1_string = js_positions["positionEvaluation"][p]["position"]
df[aux1_string] = 0
if df['store_location'].str.contains('NY').any():
for k in range(len(js_positions["positionEvaluation"][p]["sumElements"])):
tmp = js_positions["positionEvaluation"][p]["sumElements"][k]
df[aux1_string] = df[aux1_string] df[tmp_for_gk]
df[aux1_string] = df[aux1_string] / len(js_positions["positionEvaluation"][p]["gralSum"])
else:
for k in range(len(js_positions["positionEvaluation"][p]["elementsProm"])):
tmp = js_positions["positionEvaluation"][p]["elementsProm"][k]
df[aux1_string] = df[aux1_string] df[tmp]
df[aux1_string] = df[aux1_string] / len(js_positions["positionEvaluation"][p]["elementsProm"])
Explicit list:
sumElements = ["attr1", "attr2"]
gralSum = ["attr2", "attr3", "attr4"]
elementsProm = ["attr1", "attr2", "attr3", "attr4"]
Expected output:
| Product | store_location | attr1 | attr2 | attr3 | attr4 | global |
| ------- | -------------- | ----- | ----- | ----- | ----- | ------ |
| First | NY-store1 | 3 | 5 | 2 | 2 | 2,66 |
| Second | NY-store2 | 1 | 3 | 5 | 1 | 1,33 |
| Third | NJ-store1 | 3 | 5 | 2 | 2 | 3 |
| Fourth | PA-store1 | 1 | 3 | 5 | 1 | 2,5 |
CodePudding user response:
IIUC, you want to sum different attribute whether or not the string NY is in the store name?
For this you can use boolean indexing and mean
or sum
:
sumElements = ["attr1", "attr2"]
gralSum = ["attr2", "attr3", "attr4"]
elementsProm = ["attr1", "attr2", "attr3", "attr4"]
df['global'] = np.where(df['store_location'].str.contains('NY'),
df[sumElements].sum(1).div(len(gralSum)),
df[elementsProm].mean(1))
output:
Product store_location attr1 attr2 attr3 attr4 global
0 First NY-store1 3 5 2 2 2.666667
1 Second NY-store2 1 3 5 1 1.333333
2 Third NJ-store1 3 5 2 2 3.000000
3 Fourth PA-store1 1 3 5 1 2.500000