Home > Net >  Sum values to column if another have a string Pandas
Sum values to column if another have a string Pandas

Time:05-10

I have a dataframe with values of different attributes, and i have a json file which have a list of attributes to sum only if a column in the dataframe contains a string.

| Product | store_location | attr1 | attr2 | attr3 | attr4 | global |
| ------- | -------------- | ----- | ----- | ----- | ----- | ------ |
| First   | NY-store1      | 3     | 5     |  2    | 2     |        |
| Second  | NY-store2      | 1     | 3     |  5    | 1     |        |
| Third   | NJ-store1      | 3     | 5     |  2    | 2     |        |
| Fourth  | PA-store1      | 1     | 3     |  5    | 1     |        |

The json file has this structure:

{
"positionEvaluation": [
    {
      "position": "Global",
      "sumElements": ["attr1", "attr2"],
      "gralSum": ["attr2", "attr3", "attr4"],
      "elementsProm": ["attr1", "attr2", "attr3", "attr4"]
    }
]
}

Obviously the real file has more attributes, only for demo. So, I want when the product has in the store location the string 'NY' take respective attributes of "sumElements" and divide by the length of "gralSum", and if the product has another string like 'NJ' or 'PA' just sum all elements of "elementsProm" and then divide by the length of it.

Here my code:

for p in range(len(js_positions["positionEvaluation"])):
    aux1_string = js_positions["positionEvaluation"][p]["position"]
    df[aux1_string] = 0
    if df['store_location'].str.contains('NY').any():
        for k in range(len(js_positions["positionEvaluation"][p]["sumElements"])):
            tmp = js_positions["positionEvaluation"][p]["sumElements"][k]
            df[aux1_string] = df[aux1_string]   df[tmp_for_gk]

        df[aux1_string] = df[aux1_string] / len(js_positions["positionEvaluation"][p]["gralSum"])

    else:
        for k in range(len(js_positions["positionEvaluation"][p]["elementsProm"])):
            tmp = js_positions["positionEvaluation"][p]["elementsProm"][k]
            df[aux1_string] = df[aux1_string]   df[tmp]
        df[aux1_string] = df[aux1_string] / len(js_positions["positionEvaluation"][p]["elementsProm"])

Explicit list:

sumElements = ["attr1", "attr2"]
gralSum = ["attr2", "attr3", "attr4"]
elementsProm = ["attr1", "attr2", "attr3", "attr4"]

Expected output:

| Product | store_location | attr1 | attr2 | attr3 | attr4 | global |
| ------- | -------------- | ----- | ----- | ----- | ----- | ------ |
| First   | NY-store1      | 3     | 5     |  2    | 2     |  2,66  |
| Second  | NY-store2      | 1     | 3     |  5    | 1     |  1,33  |
| Third   | NJ-store1      | 3     | 5     |  2    | 2     |    3   |
| Fourth  | PA-store1      | 1     | 3     |  5    | 1     |   2,5  |

CodePudding user response:

IIUC, you want to sum different attribute whether or not the string NY is in the store name?

For this you can use boolean indexing and mean or sum:

sumElements = ["attr1", "attr2"]
gralSum = ["attr2", "attr3", "attr4"]
elementsProm = ["attr1", "attr2", "attr3", "attr4"]

df['global'] = np.where(df['store_location'].str.contains('NY'),
                        df[sumElements].sum(1).div(len(gralSum)),
                        df[elementsProm].mean(1))

output:

  Product store_location  attr1  attr2  attr3  attr4    global
0   First      NY-store1      3      5      2      2  2.666667
1  Second      NY-store2      1      3      5      1  1.333333
2   Third      NJ-store1      3      5      2      2  3.000000
3  Fourth      PA-store1      1      3      5      1  2.500000
  • Related