Home > OS >  r recoding ordered factor values to numeric
r recoding ordered factor values to numeric

Time:04-03

This is a tricky situation. My dataset is like this

Id    Col1
1     Pre01
1     Peri01
1     Inf02
1     EK01
1     MD02
1     Ad05
2     EK02
2     EK05
3     EK01
3     MD01
4     CO
4     EK01

The order is Pre->Peri->Inf->EK->MD->Ad->CO

If an ID has Col1 values with only Pre,Peri,Inf,Ek,MD,Ado or CO I can recode these values to 1, 2,3,4,5 ,6 or7 using case_when statement. The random trailing numbers complicate things for me.

For example , ID 2,the values EK02 and EK05 should be coded 1,2 ..so on,

Expected dataset

Id    Col1     Col2
1     Pre01    1
1     Peri01   2
1     Inf02    3
1     EK01     4
1     MD02     5
1     Ad05     6
2     EK02     1
2     EK05     2
3     EK01     1
3     MD01     2
4     CO       2
4     EK01     1

Not sure how to recode ordered factors with trailing numbers. Any help is apricated. Thanks in advance.

CodePudding user response:

A possible solution to the first half of the problem is using str_replace_all to change 0s into . so we can then use separate to detach the digits from the code. This relies on the assumption that no numbers start with any digit other than 0.

library(dplyr)
library(tidyr)
library(stringr)

dat |> mutate(Col1 = str_replace_all(Col1, "0", ".")) |> 
       separate(Col1, c("code", "digits"))

CodePudding user response:

I think this solution may work but you have to try it on your original data set.

library(dplyr)

vec <- c(Pre, Peri, Inf, EK, MD, Ad, CO)

df %>%
  rowwise() %>%
  mutate(Col2 = which(!is.na(match(vec, gsub("(\\D )\\d ", "\\1", Col1))))) %>%
  group_by(Id) %>%
  mutate(Col3 = ifelse(grepl("\\d", Col1), gsub("\\w (?:0)(\\d )", "\\1", Col1), Col2)) %>%
  arrange(Id, Col2, Col3) %>%
  mutate(Col3 = row_number()) %>%
  select(-Col2)

# A tibble: 12 x 3
# Groups:   Id [4]
      Id Col1    Col3
   <int> <chr>  <int>
 1     1 Pre01      1
 2     1 Peri01     2
 3     1 Inf02      3
 4     1 EK01       4
 5     1 MD02       5
 6     1 Ad05       6
 7     2 EK02       1
 8     2 EK05       2
 9     3 EK01       1
10     3 MD01       2
11     4 EK01       1
12     4 CO         2
  • Related