Home > database >  Replace Entire String In Dataframe
Replace Entire String In Dataframe

Time:10-30

I currently have a data frame resembling the below:

     ci_2 ci_2.01    ci_2.02 ci_2.03 ci_2.04 ci_2.05  
    <dbl> <chr>        <dbl>   <dbl>   <dbl> <chr>     
1 -215.   "-0.25905…" -252.   -254.   -254.   "-0.5771…" 
2  522.   "Error : …"  516.    520.    530.   "Error :…"  
3    0    "0"           0       0       0    "-67.768…" 
4   36.2  "Error : …"   36.2    35.2    33.8  "Error :…"   
5  -10.5  "-0.78985…"   -9.20   -3.67   -3.60 "-0.7723…" 

Is there a way that would allow me to detect any string in a column and replace it with 0? I've seen examples with specific strings but was hoping for something more general.

Thanks in advance.

CodePudding user response:

You can remove specific values, such as "Error etc.", across columns in a dataframe in dplyr like so:

library(dplyr)
df %>%
  mutate(across(everything(), ~sub("^Error.*$", "NA", .)))
  ID  b  c d
1  1  x  9 7
2  2  y NA 7
3  3 NA  x 7

Data:

df <- data.frame(ID = c(1, 2, 3),
                 b = c("x", "y", "Error:..."),
                 c = c("9", "Error: xyz", "x"),
                 d = c(7, 7, 7))

CodePudding user response:

In base R, using data from @Chris Ruehlemann

df[-1] <- lapply(df[-1], function(x) {
  x[grepl('Error', x)] <- NA
  x
})

#  ID    b    c d
#1  1    x    9 7
#2  2    y <NA> 7
#3  3 <NA>    x 7

grepl detects if any value has the word 'Error' in it and replace it with NA.

CodePudding user response:

For any string in any column? Maybe:

data1 <- structure(list(ID = c(1, 2, 3),
                        b = c("x", "y", "error"),
                        c = c("9", "z", "x"),
                        d = c(7, 7, 7)), 
                   class = c("tbl_df", "tbl", "data.frame"), 
                   row.names = c(NA, -3L))
data1
#> # A tibble: 3 × 4
#>      ID b     c         d
#>   <dbl> <chr> <chr> <dbl>
#> 1     1 x     9         7
#> 2     2 y     z         7
#> 3     3 error x         7

data.frame(lapply(data1, function(x) {gsub("[[:alpha:]] ", 0, x)}))
#>   ID b c d
#> 1  1 0 9 7
#> 2  2 0 0 7
#> 3  3 0 0 7

Created on 2021-10-29 by the reprex package (v2.0.1)

Or, with tidyverse functions:

library(tidyverse)
data1 %>%
  mutate(across(everything(),
                ~str_replace(.x, "[[:alpha:]] ", "0")))
#> # A tibble: 3 × 4
#>   ID    b     c     d    
#>   <chr> <chr> <chr> <chr>
#> 1 1     0     9     7    
#> 2 2     0     0     7    
#> 3 3     0     0     7   

CodePudding user response:

df <- structure(list(ID = c(1, 2, 3),
                        b = c("-0.25905…", -252., "66..66"),
                        c = c("-67.768…", "Error : …", "33.8"),
                        d = c(0, "9", 7)), 
                   class = c("tbl_df", "tbl", "data.frame"), 
                   row.names = c(NA, -3L))

library(data.table)
library(stringr)

setDT(df)

cols <- names(df)

# try to fix the numbers and have the as.numeric() fallback for where we fail to do so
df[, (cols) := lapply(.SD, function(x) {
  as.numeric(stringr::str_replace_all(x, "[[:punct:][:alpha:]] $", ""))
}), .SDcols = cols]

# not sure if it is legit to consider a true 0 value the same as a replaced "error" to zero
setnafill(df, type = c("const"), fill = 0)
  • Related