I currently have a data frame resembling the below:
ci_2 ci_2.01 ci_2.02 ci_2.03 ci_2.04 ci_2.05
<dbl> <chr> <dbl> <dbl> <dbl> <chr>
1 -215. "-0.25905…" -252. -254. -254. "-0.5771…"
2 522. "Error : …" 516. 520. 530. "Error :…"
3 0 "0" 0 0 0 "-67.768…"
4 36.2 "Error : …" 36.2 35.2 33.8 "Error :…"
5 -10.5 "-0.78985…" -9.20 -3.67 -3.60 "-0.7723…"
Is there a way that would allow me to detect any string in a column and replace it with 0? I've seen examples with specific strings but was hoping for something more general.
Thanks in advance.
CodePudding user response:
You can remove specific values, such as "Error etc.", across columns in a dataframe in dplyr
like so:
library(dplyr)
df %>%
mutate(across(everything(), ~sub("^Error.*$", "NA", .)))
ID b c d
1 1 x 9 7
2 2 y NA 7
3 3 NA x 7
Data:
df <- data.frame(ID = c(1, 2, 3),
b = c("x", "y", "Error:..."),
c = c("9", "Error: xyz", "x"),
d = c(7, 7, 7))
CodePudding user response:
In base R, using data from @Chris Ruehlemann
df[-1] <- lapply(df[-1], function(x) {
x[grepl('Error', x)] <- NA
x
})
# ID b c d
#1 1 x 9 7
#2 2 y <NA> 7
#3 3 <NA> x 7
grepl
detects if any value has the word 'Error'
in it and replace it with NA
.
CodePudding user response:
For any string in any column? Maybe:
data1 <- structure(list(ID = c(1, 2, 3),
b = c("x", "y", "error"),
c = c("9", "z", "x"),
d = c(7, 7, 7)),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -3L))
data1
#> # A tibble: 3 × 4
#> ID b c d
#> <dbl> <chr> <chr> <dbl>
#> 1 1 x 9 7
#> 2 2 y z 7
#> 3 3 error x 7
data.frame(lapply(data1, function(x) {gsub("[[:alpha:]] ", 0, x)}))
#> ID b c d
#> 1 1 0 9 7
#> 2 2 0 0 7
#> 3 3 0 0 7
Created on 2021-10-29 by the reprex package (v2.0.1)
Or, with tidyverse functions:
library(tidyverse)
data1 %>%
mutate(across(everything(),
~str_replace(.x, "[[:alpha:]] ", "0")))
#> # A tibble: 3 × 4
#> ID b c d
#> <chr> <chr> <chr> <chr>
#> 1 1 0 9 7
#> 2 2 0 0 7
#> 3 3 0 0 7
CodePudding user response:
df <- structure(list(ID = c(1, 2, 3),
b = c("-0.25905…", -252., "66..66"),
c = c("-67.768…", "Error : …", "33.8"),
d = c(0, "9", 7)),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -3L))
library(data.table)
library(stringr)
setDT(df)
cols <- names(df)
# try to fix the numbers and have the as.numeric() fallback for where we fail to do so
df[, (cols) := lapply(.SD, function(x) {
as.numeric(stringr::str_replace_all(x, "[[:punct:][:alpha:]] $", ""))
}), .SDcols = cols]
# not sure if it is legit to consider a true 0 value the same as a replaced "error" to zero
setnafill(df, type = c("const"), fill = 0)