Home > Blockchain >  Relabel levels of factors as desired
Relabel levels of factors as desired

Time:11-03

I would like to relabel the levels of factors as follows:

i. If the level is of length 3 and above relabel it to sentence case otherwise do title case

ie home doing nothing becomes Home doing nothing, yes becomes Yes and good practice becomes Good Practice

Is there any way to do this?

library(tidyverse)

vars <- c("a", "b", "c", "d", "e")


mydata <- tribble(
  ~"a", ~"b", ~"c", ~"d", ~"e", ~"id",
  "yes", "school in Kenya", "r", 10, "good practice", 1,
  "no", "home doing nothing", "python", 12, "doing well", 3,
  "no", "school in Tanzania", "c  ", 35, "by walking", 4,
  "yes", "home practising", "c", 65, "practising everyday", 5,
  "no", "home", "java", 78, "sitting alone", 7
) %>%
  mutate(across(.cols = vars, ~as_factor(.)))


# mydata %>%
#   mutate(across(where(is.factor), ~fct_relabel(., str_to_sentence(.))))

CodePudding user response:

Here is one possible solution. Note that the columns you are considering as factor are actually character variables and not factor.

library(dplyr)
library(stringr)

mydata %>%
  mutate(across(where(is.character), 
                ~ if_else(stringi::stri_count_words(.x)>=3, 
                          sub("^([a-z])", "\\U\\1\\E", .x, perl=T), 
                          str_to_title(.x))))

# A tibble: 5 x 6
  a     b                  c          d e                      id
  <chr> <chr>              <chr>  <dbl> <chr>               <dbl>
1 Yes   School in Kenya    R         10 Good Practice           1
2 No    Home doing nothing Python    12 Doing Well              3
3 No    School in Tanzania C         35 By Walking              4
4 Yes   Home Practising    C         65 Practising Everyday     5
5 No    Home               Java      78 Sitting Alone           7

CodePudding user response:

Here are two options. 1) you evaluate the >3 and <3 separately. Or 2) you evaluate both at the same time. Both have their pros and cons for readability.

library(tidyverse)

#option 1
mydata |>
  mutate(across(where(\(x) length(levels(x)) >= 3), 
              \(x) fct_relabel(x, str_to_sentence)),
         across(where(\(x) length(levels(x)) < 3), 
              \(x) fct_relabel(x, str_to_title)))
#> # A tibble: 5 x 6
#>   a     b                  c      d     e                   id   
#>   <fct> <fct>              <fct>  <fct> <fct>               <fct>
#> 1 Yes   School in kenya    R      10    Good practice       1    
#> 2 No    Home doing nothing Python 12    Doing well          3    
#> 3 No    School in tanzania C      35    By walking          4    
#> 4 Yes   Home practising    C      65    Practising everyday 5    
#> 5 No    Home               Java   78    Sitting alone       7

#option 2
mydata |>
  mutate(across(everything(), \(x) if_else(
    rep(length(levels(x)) >= 3, length(x)), 
    fct_relabel(x, str_to_sentence),
    fct_relabel(x, str_to_title)
  )))
#> # A tibble: 5 x 6
#>   a     b                  c      d     e                   id   
#>   <fct> <fct>              <fct>  <fct> <fct>               <fct>
#> 1 Yes   School in kenya    R      10    Good practice       1    
#> 2 No    Home doing nothing Python 12    Doing well          3    
#> 3 No    School in tanzania C      35    By walking          4    
#> 4 Yes   Home practising    C      65    Practising everyday 5    
#> 5 No    Home               Java   78    Sitting alone       7
  • Related