Convert existing dataframe variable to factor in Tidyverse-CodePudding

I know there are many versions to this question, but I am looking for a specific solution. When you have an existing character variable in a dataframe, is there an easy method for converting that variable to a factor using the tidyverse format? For example, the 2nd line of code below won't reorder the factor levels, but the last line will. How do I make the 2nd line work? There are situations when this would be useful -- importing and modifying existing datasets. Many Thanks!

df <- data.frame(x = c(1,2), y = c('post','pre')) %>%
      as_factor(y, levels = c('pre','post'))

df$y <- factor(df$y, levels = c('pre', 'post'))

CodePudding user response：

We can use fct_relevel from forcats

library(dplyr)
library(forcats)
df1 <- data.frame(x = c(1,2), y = c('post','pre')) %>% 
       mutate(y = fct_relevel(y, 'pre', 'post'))

-output

> df1$y
[1] post pre 
Levels: pre post

Regarding the use of as_factor, according to documentation

Compared to base R, when x is a character, this function creates levels in the order in which they appear, which will be the same on every platform.

i.e. post, followed by pre

> as_factor(c('post','pre'))
[1] post pre 
Levels: post pre

whereas the following options will not work as there is no argument named levels in as_factor

> as_factor(c('post','pre'), "pre", "post")
Error: 2 components of `...` were not used.

We detected these problematic arguments:
* `..1`
* `..2`

Did you misspecify an argument?
Run `rlang::last_error()` to see where the error occurred.
> as_factor(c('post','pre'), levels = c("pre", "post"))
Error: 1 components of `...` were not used.

We detected these problematic arguments:
* `levels`

Did you misspecify an argument?
Run `rlang::last_error()` to see where the error occurred.

Also, in tidyverse, we need to extract the column with pull or .$ or else have to modify the column within mutate.

CodePudding user response：

We could also use relevel:

df <- data.frame(x = c(1,2), y = c('post','pre')) 

library(dplyr)
df <- df %>% 
  mutate(y = relevel(as.factor(y), 'pre', 'post'))

df$y
levels(df$y)

  x    y
1 1 post
2 2  pre

> df$y
[1] post pre 
Levels: pre post
> levels(df$y)
[1] "pre"  "post"