I know there are many versions to this question, but I am looking for a specific solution. When you have an existing character variable in a dataframe, is there an easy method for converting that variable to a factor using the tidyverse format? For example, the 2nd line of code below won't reorder the factor levels, but the last line will. How do I make the 2nd line work? There are situations when this would be useful -- importing and modifying existing datasets. Many Thanks!
df <- data.frame(x = c(1,2), y = c('post','pre')) %>%
as_factor(y, levels = c('pre','post'))
df$y <- factor(df$y, levels = c('pre', 'post'))
CodePudding user response:
We can use fct_relevel
from forcats
library(dplyr)
library(forcats)
df1 <- data.frame(x = c(1,2), y = c('post','pre')) %>%
mutate(y = fct_relevel(y, 'pre', 'post'))
-output
> df1$y
[1] post pre
Levels: pre post
Regarding the use of as_factor
, according to documentation
Compared to base R, when x is a character, this function creates levels in the order in which they appear, which will be the same on every platform.
i.e. post
, followed by pre
> as_factor(c('post','pre'))
[1] post pre
Levels: post pre
whereas the following options will not work as there is no argument named levels
in as_factor
> as_factor(c('post','pre'), "pre", "post")
Error: 2 components of `...` were not used.
We detected these problematic arguments:
* `..1`
* `..2`
Did you misspecify an argument?
Run `rlang::last_error()` to see where the error occurred.
> as_factor(c('post','pre'), levels = c("pre", "post"))
Error: 1 components of `...` were not used.
We detected these problematic arguments:
* `levels`
Did you misspecify an argument?
Run `rlang::last_error()` to see where the error occurred.
Also, in tidyverse
, we need to extract the column with pull
or .$
or else have to modify the column within mutate
.
CodePudding user response:
We could also use relevel
:
df <- data.frame(x = c(1,2), y = c('post','pre'))
library(dplyr)
df <- df %>%
mutate(y = relevel(as.factor(y), 'pre', 'post'))
df$y
levels(df$y)
x y
1 1 post
2 2 pre
> df$y
[1] post pre
Levels: pre post
> levels(df$y)
[1] "pre" "post"