Loop boxplot.stats function for only a selected columns in dataset-CodePudding

I am trying to figure out how to make a loop that allows me to print the outliers of a particular column in a dataset. For example: if I have the column x (factor), y (factor), z (numeric), t (numeric), I would only want it to do it for z and t. For this I have proposed a code that evaluates whether the varibale is numerical or integral and then computes the outliers.

for(i in df) {                                        
  print(boxplot.stats(df$z)$out)  
}

Any help of how to continue?

CodePudding user response：

I think what you are asking for is something like

varnames <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
for(i in 1:4) {                                        
    print(boxplot.stats(iris[,varnames[i]])$out)  
}

Some things that changed. First, I have a vector of the column names. I think from what you said you have a way of doing that.

Second the for now has an index that reflects that vector. You could make it more dynamic by getting the length.

Third, inside the loop, I reference the index, i, and use it to extract each of the columns in turn.

CodePudding user response：

As the number of outliers will differ, you might prefer to collect them in list form for further referral:

## example data iris:
iris |> lapply(\(col) boxplot(col, plot = FALSE)$out)

output:

$Sepal.Length
numeric(0)

$Sepal.Width
[1] 4.4 4.1 4.2 2.0

$Petal.Length
numeric(0)

$Petal.Width
numeric(0)

$Species
numeric(0)

CodePudding user response：

Here is a function that first finds the numeric columns and then those columns outliers.

fun <- function(x) {
  i <- sapply(x, is.numeric)
  if(any(i))
    lapply(x[i], \(y) boxplot.stats(y)$out)
}

fun(iris)
# $Sepal.Length
# numeric(0)
#
# $Sepal.Width
# [1] 4.4 4.1 4.2 2.0
#
# $Petal.Length
# numeric(0)
#
# $Petal.Width
# numeric(0)

If there are no numeric columns, the function returns NULL invisibly.

fun(data.frame(X = letters))   # doesn't print the invisible return value

res <- fun(data.frame(X = letters))
res
# NULL