Home > Mobile >  Why do factors get coerced to a number subsetting a data frame?
Why do factors get coerced to a number subsetting a data frame?

Time:05-14

I was trying to get the diagonal of the iris data set and wrote the following for loop:

diagonal_list <- list()
  for (j in seq_len(ncol(iris))) {
    diagonal_list[j] <- iris[[j,j]]
  }
  diagonal_list

My output is:

[[1]]
[1] 5.1

[[2]]
[1] 3

[[3]]
[1] 1.3

[[4]]
[1] 0.2

[[5]]
[1] 1

But I want

[[1]]
[1] 5.1

[[2]]
[1] 3

[[3]]
[1] 1.3

[[4]]
[1] 0.2

[[5]]
[1] setosa
Levels: setosa versicolor virginica

This normally should return a list of the diagonal, while the 5 th column of the iris data frame contains the species. However, in my list output the species is not a factor but simply 1 (a number). How can I make sure that my list contains the factor?

CodePudding user response:

You have to add iris[[j,j]] in a list

diagonal_list <- list()
  for (j in seq_len(ncol(iris))) {
    diagonal_list[j] <- list(iris[[j,j]])
  }

str(diagonal_list)

List of 5
 $ : num 5.1
 $ : num 3
 $ : num 1.3
 $ : num 0.2
 $ : Factor w/ 3 levels "setosa","versicolor",..: 1

CodePudding user response:

The assignment in the for-loop should use double brackets [[ on the both sides.

diagonal_list <- list()
for (j in seq_len(ncol(iris))) {
  diagonal_list[[j]] <- iris[[j,j]]
}

Another solution to extract the diagonal without a loop:

lapply(seq_along(iris), \(x) iris[x, x])

Output
[[1]]
[1] 5.1

[[2]]
[1] 3

[[3]]
[1] 1.3

[[4]]
[1] 0.2

[[5]]
[1] setosa
Levels: setosa versicolor virginica
  • Related