This is a tricky situation. My dataset is like this
Id Col1
1 Pre01
1 Peri01
1 Inf02
1 EK01
1 MD02
1 Ad05
2 EK02
2 EK05
3 EK01
3 MD01
4 CO
4 EK01
The order is Pre->Peri->Inf->EK->MD->Ad->CO
If an ID has Col1 values with only Pre,Peri,Inf,Ek,MD,Ado or CO I can recode these values to 1, 2,3,4,5 ,6 or7 using case_when
statement. The random trailing numbers complicate things for me.
For example , ID 2,the values EK02 and EK05 should be coded 1,2 ..so on,
Expected dataset
Id Col1 Col2
1 Pre01 1
1 Peri01 2
1 Inf02 3
1 EK01 4
1 MD02 5
1 Ad05 6
2 EK02 1
2 EK05 2
3 EK01 1
3 MD01 2
4 CO 2
4 EK01 1
Not sure how to recode ordered factors with trailing numbers. Any help is apricated. Thanks in advance.
CodePudding user response:
A possible solution to the first half of the problem is using str_replace_all
to change 0
s into .
so we can then use separate
to detach the digits from the code. This relies on the assumption that no numbers start with any digit other than 0
.
library(dplyr)
library(tidyr)
library(stringr)
dat |> mutate(Col1 = str_replace_all(Col1, "0", ".")) |>
separate(Col1, c("code", "digits"))
CodePudding user response:
I think this solution may work but you have to try it on your original data set.
library(dplyr)
vec <- c(Pre, Peri, Inf, EK, MD, Ad, CO)
df %>%
rowwise() %>%
mutate(Col2 = which(!is.na(match(vec, gsub("(\\D )\\d ", "\\1", Col1))))) %>%
group_by(Id) %>%
mutate(Col3 = ifelse(grepl("\\d", Col1), gsub("\\w (?:0)(\\d )", "\\1", Col1), Col2)) %>%
arrange(Id, Col2, Col3) %>%
mutate(Col3 = row_number()) %>%
select(-Col2)
# A tibble: 12 x 3
# Groups: Id [4]
Id Col1 Col3
<int> <chr> <int>
1 1 Pre01 1
2 1 Peri01 2
3 1 Inf02 3
4 1 EK01 4
5 1 MD02 5
6 1 Ad05 6
7 2 EK02 1
8 2 EK05 2
9 3 EK01 1
10 3 MD01 2
11 4 EK01 1
12 4 CO 2