I am trying to figure out how to make a loop that allows me to print the outliers of a particular column in a dataset. For example: if I have the column x (factor), y (factor), z (numeric), t (numeric), I would only want it to do it for z and t. For this I have proposed a code that evaluates whether the varibale is numerical or integral and then computes the outliers.
for(i in df) {
print(boxplot.stats(df$z)$out)
}
Any help of how to continue?
CodePudding user response:
I think what you are asking for is something like
varnames <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
for(i in 1:4) {
print(boxplot.stats(iris[,varnames[i]])$out)
}
Some things that changed. First, I have a vector of the column names. I think from what you said you have a way of doing that.
Second the for
now has an index that reflects that vector. You could make it more dynamic by getting the length.
Third, inside the loop, I reference the index, i
, and use it to extract each of the columns in turn.
CodePudding user response:
As the number of outliers will differ, you might prefer to collect them in list form for further referral:
## example data iris:
iris |> lapply(\(col) boxplot(col, plot = FALSE)$out)
output:
$Sepal.Length
numeric(0)
$Sepal.Width
[1] 4.4 4.1 4.2 2.0
$Petal.Length
numeric(0)
$Petal.Width
numeric(0)
$Species
numeric(0)
CodePudding user response:
Here is a function that first finds the numeric columns and then those columns outliers.
fun <- function(x) {
i <- sapply(x, is.numeric)
if(any(i))
lapply(x[i], \(y) boxplot.stats(y)$out)
}
fun(iris)
# $Sepal.Length
# numeric(0)
#
# $Sepal.Width
# [1] 4.4 4.1 4.2 2.0
#
# $Petal.Length
# numeric(0)
#
# $Petal.Width
# numeric(0)
If there are no numeric columns, the function returns NULL
invisibly.
fun(data.frame(X = letters)) # doesn't print the invisible return value
res <- fun(data.frame(X = letters))
res
# NULL