R replace specific value in many columns across dataframe-CodePudding

I am mainly interested in replacing a specific value (81) in many columns across the dataframe.

For example, if this is my dataset

    Id         Date         Col_01     Col_02   Col_03       Col_04
    30         2012-03-31   1          A42.2    20.46        43  
    36         1996-11-15   42         V73      23           55
    96         2010-02-07   X48        81       13           3R
    40         2010-03-18   AD14       18.12    20.12        36
    69         2012-02-21   8          22.45    12           10                 
    11         2013-07-03   81         V017     78.12        81         
    22         2001-06-01   11         09       55           12
    83         2005-03-16   80.45      V22.15   46.52        X29.11 
    92         2012-02-12   1          4        67           12 
    34         2014-03-10   82.12      N72.22   V45.44       10

I like to replace value 81 in columns Col1, Col2, Col3, Col4 to NA. The final expected dataset like this

    Id         Date         Col_01     Col_02   Col_03       Col_04
    30         2012-03-31   1          A42.2    20.46        43  
    36         1996-11-15   42         V73      23           55
    96         2010-02-07   X48        **NA     13           3R
    40         2010-03-18   AD14       18.12    20.12        36
    69         2012-02-21   8          22.45    12           10                 
    11         2013-07-03   **NA       V017     78.12      **NA         
    22         2001-06-01   11         09       55           12
    83         2005-03-16   80.45      V22.15   46.52        X29.11 
    92         2012-02-12   1          4        67           12 
    34         2014-03-10   82.12      N72.22   V45.44       10

I tried this approach

df %>% select(matches("^Col_\\d $"))[ df %>% select(matches("^Col_\\d $")) == 81 ] <- NA

Something similar to this solution data[ , 2:3 ][ data[ , 2:3 ] == 4 ] <- 10 here Replacing occurrences of a number in multiple columns of data frame with another value in R

This did not work.

Any suggestion is much appreciated. Thanks in adavance.

CodePudding user response：

Instead of select, we can directly specify the matches in mutate to replace the values that are '81' to NA (use na_if)

library(dplyr)
df <- df %>%
   mutate(across(matches("^Col_\\d $"), ~ na_if(., "81")))

-output

df
   Id       Date Col_01 Col_02 Col_03 Col_04
1  30 2012-03-31      1  A42.2  20.46     43
2  36 1996-11-15     42    V73     23     55
3  96 2010-02-07    X48   <NA>     13     3R
4  40 2010-03-18   AD14  18.12  20.12     36
5  69 2012-02-21      8  22.45     12     10
6  11 2013-07-03   <NA>   V017  78.12   <NA>
7  22 2001-06-01     11     09     55     12
8  83 2005-03-16  80.45 V22.15  46.52 X29.11
9  92 2012-02-12      1      4     67     12
10 34 2014-03-10  82.12 N72.22 V45.44     10

Or we can use base R

i1 <- grep("^Col_\\d $", names(df))
df[i1][df[i1] == "81"] <- NA

The issue in the OP's code is the assignment is not triggered as we expect i.e.

(df %>% 
     select(matches("^Col_\\d $")))[(df %>% 
        select(matches("^Col_\\d $"))) == "81" ]
[1] "81" "81" "81"

which is same as

df[i1][df[i1] == "81"]
[1] "81" "81" "81"

and not the assignment

(df %>% 
      select(matches("^Col_\\d $")))[(df %>% 
         select(matches("^Col_\\d $"))) == "81" ] <- NA
Error in (df %>% select(matches("^Col_\\d $")))[(df %>% select(matches("^Col_\\d $"))) ==  : 
  could not find function "(<-"

In base R, it does the assignment with [<-

data

df <- structure(list(Id = c(30L, 36L, 96L, 40L, 69L, 11L, 22L, 83L, 
92L, 34L), Date = c("2012-03-31", "1996-11-15", "2010-02-07", 
"2010-03-18", "2012-02-21", "2013-07-03", "2001-06-01", "2005-03-16", 
"2012-02-12", "2014-03-10"), Col_01 = c("1", "42", "X48", "AD14", 
"8", "81", "11", "80.45", "1", "82.12"), Col_02 = c("A42.2", 
"V73", "81", "18.12", "22.45", "V017", "09", "V22.15", "4", "N72.22"
), Col_03 = c("20.46", "23", "13", "20.12", "12", "78.12", "55", 
"46.52", "67", "V45.44"), Col_04 = c("43", "55", "3R", "36", 
"10", "81", "12", "X29.11", "12", "10")),
 class = "data.frame", row.names = c(NA, 
-10L))

CodePudding user response：

We can also use replace:

library(dplyr)

df <- df %>%
   mutate(across(matches("^Col_\\d $"), ~ replace(.x, ~.x==81, NA)))