I'm trying to understand how various objects in R are composed of atomic and generic vectors.
One can construct a data.frame
out of a list
by manually setting the attributes names
, row.names
, and class
, see here.
I wonder how this might work for factors, which are internally represented as integer vectors. The solution I came up with is the following:
> f <- 1:3
> class(f) <- "factor"
> levels(f) <- c("low", "medium", "high")
Warning message:
In str.default(val) : 'object' does not have valid levels()
But for some reason this still looks different than a properly constructed factor:
> str(unclass(f))
int [1:3] 1 2 3
- attr(*, "levels")= chr [1:3] "low" "medium" "high"
> str(unclass(factor(c("low", "medium", "high"))))
int [1:3] 2 3 1
- attr(*, "levels")= chr [1:3] "high" "low" "medium"
Am I missing something? (I know this probably should not be used in production code, instead it is for educational purposes only.)
CodePudding user response:
The order matters.
f <- 1:3
levels(f) <- c("low", "medium", "high") ## mark
class(f) <- "factor"
f
# [ 1] low medium high
# Levels: low medium high
`levels<-`
adds an attribute to the vector, instead of line ## mark you could also do
attr(f, 'levels') <- c("low", "medium", "high")
Here step by step what happens:
f <- 1:3
attributes(f)
# NULL
levels(f) <- c("low", "medium", "high")
attributes(f)
# $levels
# [1] "low" "medium" "high"
class(f) <- "factor"
attributes(f)
# $levels
# [1] "low" "medium" "high"
#
# $class
# [1] "factor"
Check with "automatic" factor generation.
attributes(factor(1:3, labels=c("low", "medium", "high")))
# $levels
# [1] "low" "medium" "high"
#
# $class
# [1] "factor"
And, importantly
stopifnot(all.equal(unclass(f),
unclass(factor(1:3, labels=c("low", "medium", "high")))))
Note 1, the order of f
doesn't matter. Levels of f
are identified by their index, and element n of the assigned levels vector becomes first level, i.e. `1`='low', `2`='medium', `3`='high'
in following example.
f <- 3:1
levels(f) <- c("low", "medium", "high")
class(f) <- 'factor'
f
# [1] high medium low
# Levels: low medium high
Note 2, that this only works if f
starts with 1
and also the levels increase by 1
, because a factor is actually a labeled integer structure.
g <- 2:4
levels(g) <- c("low", "medium", "high")
class(g) <- 'factor'
g
# Error in as.character.factor(x) : malformed factor
h <- c(1, 3, 4)
levels(h) <- c("low", "medium", "high")
class(h) <- 'factor'
# Error in class(h) <- "factor" :
# adding class "factor" to an invalid object