Home > database >  R replace specific value in many columns across dataframe
R replace specific value in many columns across dataframe

Time:10-04

I am mainly interested in replacing a specific value (81) in many columns across the dataframe.

For example, if this is my dataset

    Id         Date         Col_01     Col_02   Col_03       Col_04
    30         2012-03-31   1          A42.2    20.46        43  
    36         1996-11-15   42         V73      23           55
    96         2010-02-07   X48        81       13           3R
    40         2010-03-18   AD14       18.12    20.12        36
    69         2012-02-21   8          22.45    12           10                 
    11         2013-07-03   81         V017     78.12        81         
    22         2001-06-01   11         09       55           12
    83         2005-03-16   80.45      V22.15   46.52        X29.11 
    92         2012-02-12   1          4        67           12 
    34         2014-03-10   82.12      N72.22   V45.44       10

I like to replace value 81 in columns Col1, Col2, Col3, Col4 to NA. The final expected dataset like this

    Id         Date         Col_01     Col_02   Col_03       Col_04
    30         2012-03-31   1          A42.2    20.46        43  
    36         1996-11-15   42         V73      23           55
    96         2010-02-07   X48        **NA     13           3R
    40         2010-03-18   AD14       18.12    20.12        36
    69         2012-02-21   8          22.45    12           10                 
    11         2013-07-03   **NA       V017     78.12      **NA         
    22         2001-06-01   11         09       55           12
    83         2005-03-16   80.45      V22.15   46.52        X29.11 
    92         2012-02-12   1          4        67           12 
    34         2014-03-10   82.12      N72.22   V45.44       10

I tried this approach

df %>% select(matches("^Col_\\d $"))[ df %>% select(matches("^Col_\\d $")) == 81 ] <- NA

Something similar to this solution data[ , 2:3 ][ data[ , 2:3 ] == 4 ] <- 10 here Replacing occurrences of a number in multiple columns of data frame with another value in R

This did not work.

Any suggestion is much appreciated. Thanks in adavance.

CodePudding user response:

Instead of select, we can directly specify the matches in mutate to replace the values that are '81' to NA (use na_if)

library(dplyr)
df <- df %>%
   mutate(across(matches("^Col_\\d $"), ~ na_if(., "81")))

-output

df
   Id       Date Col_01 Col_02 Col_03 Col_04
1  30 2012-03-31      1  A42.2  20.46     43
2  36 1996-11-15     42    V73     23     55
3  96 2010-02-07    X48   <NA>     13     3R
4  40 2010-03-18   AD14  18.12  20.12     36
5  69 2012-02-21      8  22.45     12     10
6  11 2013-07-03   <NA>   V017  78.12   <NA>
7  22 2001-06-01     11     09     55     12
8  83 2005-03-16  80.45 V22.15  46.52 X29.11
9  92 2012-02-12      1      4     67     12
10 34 2014-03-10  82.12 N72.22 V45.44     10

Or we can use base R

i1 <- grep("^Col_\\d $", names(df))
df[i1][df[i1] == "81"] <- NA

The issue in the OP's code is the assignment is not triggered as we expect i.e.

(df %>% 
     select(matches("^Col_\\d $")))[(df %>% 
        select(matches("^Col_\\d $"))) == "81" ]
[1] "81" "81" "81"

which is same as

df[i1][df[i1] == "81"]
[1] "81" "81" "81"

and not the assignment

(df %>% 
      select(matches("^Col_\\d $")))[(df %>% 
         select(matches("^Col_\\d $"))) == "81" ] <- NA
Error in (df %>% select(matches("^Col_\\d $")))[(df %>% select(matches("^Col_\\d $"))) ==  : 
  could not find function "(<-"

In base R, it does the assignment with [<-

data

df <- structure(list(Id = c(30L, 36L, 96L, 40L, 69L, 11L, 22L, 83L, 
92L, 34L), Date = c("2012-03-31", "1996-11-15", "2010-02-07", 
"2010-03-18", "2012-02-21", "2013-07-03", "2001-06-01", "2005-03-16", 
"2012-02-12", "2014-03-10"), Col_01 = c("1", "42", "X48", "AD14", 
"8", "81", "11", "80.45", "1", "82.12"), Col_02 = c("A42.2", 
"V73", "81", "18.12", "22.45", "V017", "09", "V22.15", "4", "N72.22"
), Col_03 = c("20.46", "23", "13", "20.12", "12", "78.12", "55", 
"46.52", "67", "V45.44"), Col_04 = c("43", "55", "3R", "36", 
"10", "81", "12", "X29.11", "12", "10")),
 class = "data.frame", row.names = c(NA, 
-10L))

CodePudding user response:

We can also use replace:

library(dplyr)

df <- df %>%
   mutate(across(matches("^Col_\\d $"), ~ replace(.x, ~.x==81, NA)))
  • Related