I have the following tibble:
test <- tibble(
id = c("John","Jacob","Jingleheimer","Schmidt"),
score = c(2,4,6,8)
)
The variable "score" is numeric. When I run the command is.numeric
, I get the following:
> is.numeric(test$score)
[1] TRUE
But when I try to do the same thing, only this time referencing the column by its index, I get a different output:
> is.numeric(test[,2])
[1] FALSE
I'm confused as to why I'm getting such disparate output to two versions of the same command. Why can't is.numeric
detect the data type when I use indexing?
CodePudding user response:
Indexing works differently for tibbles and data.frames. See this.
library(dplyr)
test <- tibble(
id = c("John","Jacob","Jingleheimer","Schmidt"),
score = c(2,4,6,8)
)
class(test[, 2])
## [1] "tbl_df" "tbl" "data.frame"
class(as.data.frame(test)[, 2])
## [1] "numeric"
The basic difference is that data frames default to drop = TRUE whereas tibbles default to drop = FALSE.
class(test[, 2])
## [1] "tbl_df" "tbl" "data.frame"
class(test[, 2, drop = FALSE]) # same
## [1] "tbl_df" "tbl" "data.frame"
class(test[, 2, drop = TRUE])
## [1] "numeric"
class(as.data.frame(test)[, 2])
## [1] "numeric"
class(as.data.frame(test)[, 2, drop = TRUE]) # same
## [1] "numeric"
class(as.data.frame(test)[, 2, drop = FALSE])
## [1] "data.frame"
Also note
class(pull(test, 2))
## [1] "numeric"
class(test[[2]])
## [1] "numeric"
class(unlist(test[, 2]))
## [1] "numeric"
sapply(test, class)
## id score
## "character" "numeric"