I have a data frame and want to add a new column to it based on another column and then replace its values.
For example column ID_old is what I have:
df1 <- structure(list(ID.old=c(1,1,1, 2,2, 3,3,3,3, 4,4, 5,5,5,5,5, 6,6,6, 7,7,7,7, 8,8, 9, 10,10,10, 11,11, 12,12,12, 13,13, 14,14,14,14, 15,15, 16, 17,17, 18, 19,19,19, 20,20,20)),
class = "data.frame", row.names = c(NA,-52L))
and now column ID_new is what I need:
df2 <- structure(list(ID.old=c(1,1,1, 2,2, 3,3,3,3, 4,4, 5,5,5,5,5, 6,6,6, 7,7,7,7, 8,8, 9, 10,10,10, 11,11, 12,12,12, 13,13, 14,14,14,14, 15,15, 16, 17,17, 18, 19,19,19, 20,20,20),
ID.new=c('a1','a1','a1', 'a2','a2', 'a3','a3','a3','a3', 'a4','a4', 'a5','a5','a5','a5','a5', 'a1','a1','a1', 'a2','a2','a2','a2', 'a3','a3', 'a4', 'a5','a5','a5', 'a1','a1', 'a2','a2','a2', 'a3','a3', 'a4','a4','a4','a4', 'a5','a5', 'a1', 'a2','a2', 'a3', 'a4','a4','a4', 'a5','a5','a5')),
class = "data.frame", row.names = c(NA,-52L))
I thought that I can use str_replace_all from stringer, but it produces something different,
library(stringr)
df1<- df1 %>%
mutate(ID.new = ID.old)
replace = c("1"="a1", "2"="a2", "3"="a3", "4"="a4", "5"="a5",
"6"="a1", "7"="a2", "8"="a3", "9"="a4", "10"="a5",
"11"="a1", "12"="a2", "13"="a3", "14"="a4", "15"="a5",
"16"="a1", "17"="a2", "18"="a3", "19"="a4", "20"="a5")
df1$ID.new<- str_replace_all(df1$ID.new, replace)
in my original data frame, I have many rows, and specifically, I need wherever it is 1,6,11,16 to be "a1".
2,7,12,17 to be "a2" etc.
How can I get a column like what we have in df2 ID.new Thanks
CodePudding user response:
You could use modulo %%
and replace
zeros with 5.
res <- transform(df1, ID.new=paste0('a', ID.old %% 5 |> {\(.) replace(., . == 0, 5)}()))
head(res, 17)
# ID.old ID.new
# 1 1 a1
# 2 1 a1
# 3 1 a1
# 4 2 a2
# 5 2 a2
# 6 3 a3
# 7 3 a3
# 8 3 a3
# 9 3 a3
# 10 4 a4
# 11 4 a4
# 12 5 a5
# 13 5 a5
# 14 5 a5
# 15 5 a5
# 16 5 a5
# 17 6 a1
Data:
df1 <- structure(list(ID.old = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 5,
5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9, 10, 10, 10, 11, 11,
12, 12, 12, 13, 13, 14, 14, 14, 14, 15, 15, 16, 17, 17, 18, 19,
19, 19, 20, 20, 20)), class = "data.frame", row.names = c(NA,
-52L))
CodePudding user response:
stringr::str_replace_all
is based on regex. For example, with your 'replace' dictionnary, it replaces every 1 it encounters with "a1", so the number '11' is replaced by "a1a1", as it contains two successive 1. Since you have already designed a dictionary, you should simply add 'start' (^
) and end ($
) regex tags, as I suggest below:
- Simply add this line of code after the creation of your actual 'replace' dictionnary:
names(replace) = paste0("^", names(replace), "$")
- And know the replacement is correct if you proceed again
df1$ID.new<- str_replace_all(df1$ID.new, replace)