Alternatives to multiple nested if elif statements-CodePudding

I have a data frame with four columns that have values between 0-100. In a new column I want to assign a value dependant on the values within the first four columns.

The values from the first four columns will be assigned a number 0, 1 or 2 and then summed together as follows:

0 - 30 = 0

31 -70 = 1

71 - 100 = 2

So the maximum number in the fifth column will be 8 and the minimum 0.

In the example data frame below the fifth column should result in 2, 3. (Just in case I haven't described this clearly.)

I'm still very new with python and at this stage the only string that I have in my bow is a very long and cumbersome multiple nested if statement, followed with df['E'] = df.apply().

My question is what is the best and most efficient function/method for populating the fifth column.

data = {
        'A':  [50, 90],
        'B': [2, 4],
        'C': [20, 80],
        'D': [75, 72],
        }

df = pd.DataFrame(data)

CodePudding user response：

Edit

A more comprehensive method with np.select:

condlist = [(0 <= df) & (df <= 30),
            (31 <= df) & (df <= 70),
            (71 <= df) & (df <= 100)]
choicelist = [0, 1, 2]

df['E'] = np.select(condlist, choicelist).sum(axis=1)
print(df)

# Output
    A  B   C   D  E
0  50  2  20  75  3
1  90  4  80  72  6

Use pd.cut after flatten your dataframe into one column with melt:

df['E'] = pd.cut(pd.melt(df, ignore_index=False)['value'],
                 bins=[0, 30, 70, 100], labels=[0, 1, 2]) \
            .cat.codes.groupby(level=0).sum()
print(df)

# Output:
    A  B   C   D  E
0  50  2  20  75  3
1  90  4  80  72  6

Details:

>>> pd.melt(df, ignore_index=False)
  variable  value
0        A     50
1        A     90
0        B      2
1        B      4
0        C     20
1        C     80
0        D     75
1        D     72

>>> pd.cut(pd.melt(df, ignore_index=False)['value'],
                 bins=[0, 30, 70, 100], labels=[0, 1, 2])
0    1
1    2
0    0
1    0
0    0
1    2
0    2
1    2
Name: value, dtype: category
Categories (3, int64): [0 < 1 < 2]