Home > Mobile >  Recode when a part of certain text is included in R
Recode when a part of certain text is included in R

Time:10-05

I need to recode a variable when it has a certain text. Here is a sample dataset looks like:

df <- data.frame(id = c(1,2,3,4,5,6),
                 var1 = c("Discontinue", "Discontunie","discontinue", "disc","DISCONTINUE","NR"))

> df
  id        var1
1  1 Discontinue
2  2 Discontunie
3  3 discontinue
4  4        disc
5  5 DISCONTINUE
6  6          NR

var1 has discontinue information with some typos, upper, lower cases etc. I believe using disc text would be a good catch to identify those values. I need to recode the v1 as discontinue. How can I get the following manipulation.

   > df
      id        var1
    1  1 discontinue
    2  2 discontinue
    3  3 discontinue
    4  4 discontinue
    5  5 discontinue
    6  6          NR

CodePudding user response:

df <- data.frame(id = c(1,2,3,4,5,6),
                 var1 = c("Discontinue", "Discontunie","discontinue", "disc","DISCONTINUE","NR"))


df$var1 <- ifelse(grepl("^disc", df$var1, ignore.case = TRUE), "discontinue", df$var1)
df
#>   id        var1
#> 1  1 discontinue
#> 2  2 discontinue
#> 3  3 discontinue
#> 4  4 discontinue
#> 5  5 discontinue
#> 6  6          NR

Created on 2022-10-04 with reprex v2.0.2

CodePudding user response:

The following should do the trick, which identifies the rows where var1 contains text disc using grep, regardless of case (ignore.case = TRUE), and replaces it with "discontinue":

df[grep("disc", df$var1, ignore.case = TRUE), "var1"] <- "discontinue"

Output:

#   id        var1
# 1  1 discontinue
# 2  2 discontinue
# 3  3 discontinue
# 4  4 discontinue
# 5  5 discontinue
# 6  6          NR
  • Related