Home > Enterprise >  Change a level of a factor after another level
Change a level of a factor after another level

Time:08-18

I want to change the order of the levels of a factor so that a specific level comes right after another level, but I'm struggling how to do it efficiently.

Let's assume that we want to change the level of the following factor so that "20" comes right after "10". So I tried this and succesfully got the expected result:

library(tidyverse)

sample_factor <- factor(1:30)

trial_factor1 <- sample_factor %>% fct_relevel("20", after=which(levels(.)=="10"))
levels(trial_factor1)
#>  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "20" "11" "12" "13" "14"
#> [16] "15" "16" "17" "18" "19" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30"

However, if the order of the initial factor is reversed, it doesn't work:

trial_factor2 <- sample_factor %>% fct_rev() %>% fct_relevel("20", after=which(levels(.)=="10"))
levels(trial_factor2)
#>  [1] "30" "29" "28" "27" "26" "25" "24" "23" "22" "21" "19" "18" "17" "16" "15"
#> [16] "14" "13" "12" "11" "10" "9"  "20" "8"  "7"  "6"  "5"  "4"  "3"  "2"  "1"

This is probably because, in this case, "20" is initially positioned before "10".

In addition, if I also try to change the order so that "30" comes right after "20" (expected factor levels: ..., 10, 20, 30, ...), the result gets worse:

trial_factor3 <- sample_factor %>% fct_rev() %>% fct_relevel("20", after=which(levels(.)=="10")) %>%
  fct_relevel("30", after=which(levels(.)=="20"))
levels(trial_factor3)
#>  [1] "29" "28" "27" "26" "25" "24" "23" "22" "21" "19" "18" "17" "16" "15" "14"
#> [16] "13" "12" "11" "10" "9"  "20" "8"  "30" "7"  "6"  "5"  "4"  "3"  "2"  "1"

Created on 2022-08-18 by the reprex package (v2.0.1)

In my real situation, I want to change the order of levels multiple times (more than 5 times) and I don't clearly know the initial order of the factor levels, so I find it really difficult to change the order flexibly.

I really appreciate your help in advance!

CodePudding user response:

Seems like fct_relevel is designed to work positionally, make this level the nth level (where n = after 1... sort of a strange design/naming decision), but you want to work based only on the level names (labels).

We can write our own version that does this, translating the after label to position, and accounting for the before/after after-label trouble. (Also, why use 30 levels in a sample when 5 will do nicely?)

fct_relevel_label = function(.f, level, after_label) {
  lev = levels(.f)
  move = which(lev == level)
  target = which(lev == after_label)
  after = if(move <= target) {target - 1} else {target}
  fct_relevel(.f, level, after = after)
}

factor(1:5) %>% fct_relevel_label("2", after_label = "4") %>% levels
# [1] "1" "3" "4" "2" "5"

factor(1:5) %>% fct_rev() %>% fct_relevel_label("2", after_label = "4") %>% levels
# [1] "5" "4" "2" "3" "1"

CodePudding user response:

The issue is that the after argument specifies the position in which to place the level in the final output, whereas with after = which(levels(.) == "10") you are determining the after based on the current position of the target. Thus, if you remove it from earlier in the order, then your description of the target location needs to be adjusted accordingly. If it's moved from somewhere after the destination, then it's fine. Therefore for your application, I think you need to test which of these situations you have. Here's a small helper function to test and return the appropriately positioned relevel.

Note: If you're going to be moving more than one level at a time, you will have to make the function a bit more complex to test for how many levels are coming from before the destination and adjust accordingly.

library(tidyverse)

f <- factor(1:5)

# works because nothing is removed from before the "after" position
f %>% fct_relevel("5", after = which(levels(.) == "3")) %>% levels()
#> [1] "1" "2" "3" "5" "4"

# fails because you are removing one element from before the "after" position
# so the new location should be shifted by 1
f %>% fct_relevel("3", after = which(levels(.) == "4")) %>% levels()
#> [1] "1" "2" "4" "5" "3"

# this function tests if the moves comes from before or after destination
fct_relevel_after <- function(fct, lev, after){
  l <- levels(fct)
  a <- match(lev, l)
  b <- match(after, l)
  if(a < b) {
    return(fct_relevel(fct, lev, after = b-1))
  } else {
    return(fct_relevel(fct, lev, after = b))
  }
}

# both work as desired
f %>% fct_relevel_after("5", "3") %>% levels()
#> [1] "1" "2" "3" "5" "4"
f %>% fct_relevel_after("3", "4") %>% levels()
#> [1] "1" "2" "4" "3" "5"

Created on 2022-08-17 by the reprex package (v2.0.1)

  • Related