Outliers data
Given Data:
Color | Number
Green | 5.0
Red | 20.0
Green | 5.0
Green | 15.0
Green | 100.0
Red | 7.0
Red | 10.0
Red | 8.0
Green | 6.0
.
Want to only take values of "green"’s number only and then plot and find outliers for them. How do you do this?
CodePudding user response:
We may subset
the dataset where the Color
is "Green"
, select
the 'Number' column and use boxplot
and extract the out
liers
boxplot(subset(Data, Color == "Green", select = Number)$Number)$out
[1] #100
data
Data <- structure(list(Color = c("Green", "Red", "Green", "Green", "Green",
"Red", "Red", "Red", "Green"), Number = c(5L, 20L, 5L, 15L, 100L,
7L, 10L, 8L, 6L)), class = "data.frame", row.names = c(NA, -9L
))
CodePudding user response:
Get the values for the Green
color:
GreenValues = Data[Data$Color=='Green',]
Then use boxplot.stats
to get the outliers:
boxplot.stats(GreenValues$Number)$out
Hope it helps.
CodePudding user response:
Interestingly, I needed to do the same thing not so long ago. On the way I found a great video explaining just that: https://youtu.be/9aDHbRb4Bf8
Here's a JS function that I made using that youtube video:
function getOutliers(input) {
// sort array ascending
const asc = arr => arr.sort((a, b) => a - b);
const sum = arr => arr.reduce((a, b) => a b, 0);
const mean = arr => sum(arr) / arr.length;
// sample standard deviation
const std = (arr) => {
const mu = mean(arr);
const diffArr = arr.map(a => (a - mu) ** 2);
return Math.sqrt(sum(diffArr) / (arr.length - 1));
};
const quantile = (arr, q) => {
const sorted = asc(arr);
const pos = (sorted.length - 1) * q;
const base = Math.floor(pos);
const rest = pos - base;
if (sorted[base 1] !== undefined) {
return sorted[base] rest * (sorted[base 1] - sorted[base]);
} else {
return sorted[base];
}
};
const q1 = quantile(input, .25);
const q3 = quantile(input, .75);
const range = q3 - q1;
const min = q1 - 1.5 * range;
const max = q3 1.5 * range;
let smallOutlierIndexes = [];
let smallOutliers = [];
let largeOutlierIndexes = [];
let largeOutliers = [];
for (let i = 0; i < input.length; i ) {
if (input[i] > max) {
largeOutlierIndexes.push(i);
largeOutliers.push(input[i]);
}
if (input[i] < min) {
smallOutlierIndexes.push(i);
smallOutliers.push(input[i]);
}
}
return { smallOutliers, smallOutlierIndexes, largeOutliers, largeOutlierIndexes, min, max };
}