Home > Mobile >  Loop boxplot.stats function for only a selected columns in dataset
Loop boxplot.stats function for only a selected columns in dataset

Time:01-02

I am trying to figure out how to make a loop that allows me to print the outliers of a particular column in a dataset. For example: if I have the column x (factor), y (factor), z (numeric), t (numeric), I would only want it to do it for z and t. For this I have proposed a code that evaluates whether the varibale is numerical or integral and then computes the outliers.

for(i in df) {                                        
  print(boxplot.stats(df$z)$out)  
}

Any help of how to continue?

CodePudding user response:

I think what you are asking for is something like

varnames <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
for(i in 1:4) {                                        
    print(boxplot.stats(iris[,varnames[i]])$out)  
}

Some things that changed. First, I have a vector of the column names. I think from what you said you have a way of doing that.

Second the for now has an index that reflects that vector. You could make it more dynamic by getting the length.

Third, inside the loop, I reference the index, i, and use it to extract each of the columns in turn.

CodePudding user response:

As the number of outliers will differ, you might prefer to collect them in list form for further referral:

## example data iris:
iris |> lapply(\(col) boxplot(col, plot = FALSE)$out)

output:

$Sepal.Length
numeric(0)

$Sepal.Width
[1] 4.4 4.1 4.2 2.0

$Petal.Length
numeric(0)

$Petal.Width
numeric(0)

$Species
numeric(0)

CodePudding user response:

Here is a function that first finds the numeric columns and then those columns outliers.

fun <- function(x) {
  i <- sapply(x, is.numeric)
  if(any(i))
    lapply(x[i], \(y) boxplot.stats(y)$out)
}

fun(iris)
# $Sepal.Length
# numeric(0)
#
# $Sepal.Width
# [1] 4.4 4.1 4.2 2.0
#
# $Petal.Length
# numeric(0)
#
# $Petal.Width
# numeric(0)

If there are no numeric columns, the function returns NULL invisibly.

fun(data.frame(X = letters))   # doesn't print the invisible return value

res <- fun(data.frame(X = letters))
res
# NULL
  • Related