I need to recode a variable when it has a certain text. Here is a sample dataset looks like:
df <- data.frame(id = c(1,2,3,4,5,6),
var1 = c("Discontinue", "Discontunie","discontinue", "disc","DISCONTINUE","NR"))
> df
id var1
1 1 Discontinue
2 2 Discontunie
3 3 discontinue
4 4 disc
5 5 DISCONTINUE
6 6 NR
var1
has discontinue information with some typos, upper, lower cases etc. I believe using disc
text would be a good catch to identify those values. I need to recode the v1
as discontinue
. How can I get the following manipulation.
> df
id var1
1 1 discontinue
2 2 discontinue
3 3 discontinue
4 4 discontinue
5 5 discontinue
6 6 NR
CodePudding user response:
df <- data.frame(id = c(1,2,3,4,5,6),
var1 = c("Discontinue", "Discontunie","discontinue", "disc","DISCONTINUE","NR"))
df$var1 <- ifelse(grepl("^disc", df$var1, ignore.case = TRUE), "discontinue", df$var1)
df
#> id var1
#> 1 1 discontinue
#> 2 2 discontinue
#> 3 3 discontinue
#> 4 4 discontinue
#> 5 5 discontinue
#> 6 6 NR
Created on 2022-10-04 with reprex v2.0.2
CodePudding user response:
The following should do the trick, which identifies the rows where var1
contains text disc
using grep
, regardless of case (ignore.case = TRUE
), and replaces it with "discontinue":
df[grep("disc", df$var1, ignore.case = TRUE), "var1"] <- "discontinue"
Output:
# id var1
# 1 1 discontinue
# 2 2 discontinue
# 3 3 discontinue
# 4 4 discontinue
# 5 5 discontinue
# 6 6 NR