i have a Dataframe which shows measurements of an item, such as Length, Width and Height:
Length Width Height Operator
102.67 49.53 19.69 Op-1
102.50 51.42 19.63 Op-1
95.37 52.25 21.51 Op-1
94.77 49.24 18.60 Op-1
104.26 47.90 19.46 Op-1
Analyzing the boxplot_stats
from matplolib, I'm trying to create a column named "Status" where, if any of the measures (at least one of them) are above of whishi
or below the whislo
it will be consider as 'Defective'.
So I've tried many possibilities, but none of them worked as expected:
feat = ['Length','Width','Height']
for item, feat in df:
if df[feat].iloc[item] > boxplot_stats(df[feat])[0]['whishi'] | df[feat].iloc[item] < boxplot_stats(df[feat])[0]['whislo']:
df['Status'] = 'Defective'
else:
df['Status'] = 'Perfect'
Another one:
def status(row):
feat = ['Length','Width','Height']
for feature in feat:
if df.iloc[row][feat] > boxplot_stats(df[feat])[0]['whishi'] | df.iloc[row][feat] < boxplot_stats(df[feat])[0]['whislo']:
val = 'Defective'
else:
val = 'Perfect'
return val
df['status'] = df.apply(status, axis=1)
For instance, the boxplot_stats
is obtained like:
In:
from matplotlib.cbook import boxplot_stats
boxplot_stats(df['Height'])
Out:
[{'mean': 20.29322,
'iqr': 1.6674999999999969,
'cilo': 20.192920598732098,
'cihi': 20.4270794012679,
'whishi': 23.39,
'whislo': 17.37,
'fliers': array([], dtype=float64),
'q1': 19.475,
'med': 20.31,
'q3': 21.1425}]
In that way, I access the whishi
like:
In:
boxplot_stats(df['Height'])[0]['whishi']
Out:
23.39
The expected result is a column of strings with values 'Defective' or 'Perfect', which I'll later treat as 0 or 1.
CodePudding user response:
You mixed up some things there. I try to explain my solution the best I can, maybe you figure out where you went wrong.
def get_status(row):
# when applying the function get_status(), it is called each row at a time
# row is a pd.Series
#print('row: ', row, '\n') <- try if you want to see what gets passed to the function
for col in ['Length','Width','Height']:
# in each row you have to check 3 values (3 columns)
# if this if condition is True only once, the function returns 'Defective'
if (
(row[col] > boxplot_stats(df[col])[0]['whishi'])
or
(row[col] < boxplot_stats(df[col])[0]['whislo'])
):
return 'Defective'
# if it didn't return anything till here (-> 3x condition was False) it will return `Perfect
return 'Perfect'
df['Status'] = df.apply(get_status,axis=1)
print(df)
Length Width Height Operator Status
0 102.67 49.53 19.69 Op-1 Perfect
1 102.50 51.42 19.63 Op-1 Perfect
2 95.37 52.25 21.51 Op-1 Defective
3 94.77 49.24 18.60 Op-1 Defective
4 104.26 47.90 19.46 Op-1 Perfect