Home > Mobile >  Outliers in certain values in column R
Outliers in certain values in column R

Time:12-05

Outliers data

Given Data:

Color  |   Number
Green   |  5.0 
Red     |  20.0
Green   |  5.0    
Green   |  15.0
Green   |  100.0
Red     |  7.0
Red     |  10.0
Red     |  8.0
Green   |  6.0

.

Want to only take values of "green"’s number only and then plot and find outliers for them. How do you do this?

CodePudding user response:

We may subset the dataset where the Color is "Green", select the 'Number' column and use boxplot and extract the outliers

boxplot(subset(Data, Color == "Green", select = Number)$Number)$out
[1] #100

data

Data <- structure(list(Color = c("Green", "Red", "Green", "Green", "Green", 
"Red", "Red", "Red", "Green"), Number = c(5L, 20L, 5L, 15L, 100L, 
7L, 10L, 8L, 6L)), class = "data.frame", row.names = c(NA, -9L
))

CodePudding user response:

Get the values for the Green color:

GreenValues = Data[Data$Color=='Green',]

Then use boxplot.stats to get the outliers:

boxplot.stats(GreenValues$Number)$out

Hope it helps.

CodePudding user response:

Interestingly, I needed to do the same thing not so long ago. On the way I found a great video explaining just that: https://youtu.be/9aDHbRb4Bf8

Here's a JS function that I made using that youtube video:

function getOutliers(input) {
  // sort array ascending
  const asc = arr => arr.sort((a, b) => a - b);

  const sum = arr => arr.reduce((a, b) => a   b, 0);

  const mean = arr => sum(arr) / arr.length;

  // sample standard deviation
  const std = (arr) => {
    const mu = mean(arr);
    const diffArr = arr.map(a => (a - mu) ** 2);
    return Math.sqrt(sum(diffArr) / (arr.length - 1));
  };

  const quantile = (arr, q) => {
    const sorted = asc(arr);
    const pos = (sorted.length - 1) * q;
    const base = Math.floor(pos);
    const rest = pos - base;
    if (sorted[base   1] !== undefined) {
      return sorted[base]   rest * (sorted[base   1] - sorted[base]);
    } else {
      return sorted[base];
    }
  };

  const q1 = quantile(input, .25);
  const q3 = quantile(input, .75);
  const range = q3 - q1;
  const min = q1 - 1.5 * range;
  const max = q3   1.5 * range;

  let smallOutlierIndexes = [];
  let smallOutliers = [];
  let largeOutlierIndexes = [];
  let largeOutliers = [];

  for (let i = 0; i < input.length; i  ) {
    if (input[i] > max) {
      largeOutlierIndexes.push(i);
      largeOutliers.push(input[i]);
    }
    if (input[i] < min) {
      smallOutlierIndexes.push(i);
      smallOutliers.push(input[i]);
    }
  }

  return { smallOutliers, smallOutlierIndexes, largeOutliers, largeOutlierIndexes, min, max };
}
  •  Tags:  
  • r
  • Related