Home > database >  Conditional column function if one of the features is True
Conditional column function if one of the features is True

Time:07-14

i have a Dataframe which shows measurements of an item, such as Length, Width and Height:

Length  Width   Height  Operator
102.67  49.53   19.69   Op-1
102.50  51.42   19.63   Op-1
95.37   52.25   21.51   Op-1
94.77   49.24   18.60   Op-1
104.26  47.90   19.46   Op-1

Analyzing the boxplot_stats from matplolib, I'm trying to create a column named "Status" where, if any of the measures (at least one of them) are above of whishi or below the whislo it will be consider as 'Defective'.

So I've tried many possibilities, but none of them worked as expected:

feat = ['Length','Width','Height']
    
for item, feat in df:
    if df[feat].iloc[item] > boxplot_stats(df[feat])[0]['whishi'] | df[feat].iloc[item] < boxplot_stats(df[feat])[0]['whislo']:
        df['Status'] = 'Defective'
    else:
        df['Status'] = 'Perfect'

Another one:

def status(row):
    feat = ['Length','Width','Height']
    for feature in feat:
        if df.iloc[row][feat] > boxplot_stats(df[feat])[0]['whishi'] | df.iloc[row][feat] < boxplot_stats(df[feat])[0]['whislo']:
            val = 'Defective'
        else:
            val = 'Perfect'
        return val

df['status'] = df.apply(status, axis=1)

For instance, the boxplot_stats is obtained like:

In:
from matplotlib.cbook import boxplot_stats
boxplot_stats(df['Height'])

Out:
[{'mean': 20.29322,
  'iqr': 1.6674999999999969,
  'cilo': 20.192920598732098,
  'cihi': 20.4270794012679,
  'whishi': 23.39,
  'whislo': 17.37,
  'fliers': array([], dtype=float64),
  'q1': 19.475,
  'med': 20.31,
  'q3': 21.1425}]

In that way, I access the whishi like:

In:
boxplot_stats(df['Height'])[0]['whishi']

Out:
23.39

The expected result is a column of strings with values 'Defective' or 'Perfect', which I'll later treat as 0 or 1.

CodePudding user response:

You mixed up some things there. I try to explain my solution the best I can, maybe you figure out where you went wrong.

def get_status(row):
    # when applying the function get_status(), it is called each row at a time
    # row is a pd.Series
    #print('row: ', row, '\n') <- try if you want to see what gets passed to the function
    for col in ['Length','Width','Height']:
        # in each row you have to check 3 values (3 columns)
        # if this if condition is True only once, the function returns 'Defective'
        if (
            (row[col] > boxplot_stats(df[col])[0]['whishi']) 
            or 
            (row[col] < boxplot_stats(df[col])[0]['whislo'])
        ):
            
            return 'Defective'
        
    # if it didn't return anything till here (-> 3x condition was False) it will return `Perfect
    return 'Perfect'

df['Status'] = df.apply(get_status,axis=1)
print(df)

   Length  Width  Height Operator     Status
0  102.67  49.53   19.69     Op-1    Perfect
1  102.50  51.42   19.63     Op-1    Perfect
2   95.37  52.25   21.51     Op-1  Defective
3   94.77  49.24   18.60     Op-1  Defective
4  104.26  47.90   19.46     Op-1    Perfect

  • Related