Home > database >  How to remove erroneous values from a column in R
How to remove erroneous values from a column in R

Time:05-04

To preface I am new to R and programming in general,

I have 5 columns of data I am analyzing which each contain values which must be trimmed before analysis. The values which must be trimmed are all represented as "-8", and "-9".

GROUPEDDATA1$V161081<-gsub("-8","-9","",as.character(GROUPEDDATA1$V161081))

This code removes all values from the column and replaces them with "", not just "-8" and "-9". Is there a simpler way to go about removing these values from the columns?

CodePudding user response:

You can use the apply to apply a certain function over multiple columns. In your gsub the pattern you are looking for is -8|-9, which means search for -8 or -9 and replace that with "". You can use the following code:

df <- data.frame(v1 = c("-8", "-9", "10"),
                 v2 = c("2", "4", "-9"))

apply(df, 2, function(x) gsub("-8|-9", "", x))

Output:

     v1   v2 
[1,] ""   "2"
[2,] ""   "4"
[3,] "10" "" 

CodePudding user response:

Using replace in lapply. Example:

dat
#   X1 X2 X3
# 1  1  5  9
# 2 -8  6 10
# 3  3  7 -9
# 4  4  8 12

dat[] <- lapply(dat, \(x) replace(x, x %in% c(-8, -9), 9999))
dat
#     X1 X2   X3
# 1    1  5    9
# 2 9999  6   10
# 3    3  7 9999
# 4    4  8   12

Use anything you like instead of 9999.


Data:

dat <- structure(list(X1 = c(1, -8, 3, 4), X2 = 5:8, X3 = c(9, 10, -9, 
12)), row.names = c(NA, -4L), class = "data.frame")

CodePudding user response:

Use mutate(across()) to isolate the columns you want to change, and for each column use gsub. I've use positive look ahead to just remove the "-" prior to the 8 or 9, and I've required that this "-" is at the beginning.

df %>% 
  mutate(across(starts_with("x"), ~gsub("^-(?=[89])","", .x, perl=TRUE)))

Output:

    x1   x2   x3
1   45  815  8-9
2    8   85  898
3  hat  123  129
4 9876 9876    9
5  A-9    9 <NA>

Input:

df = data.frame(x1 =c("45", "-8", "hat", "-9876", "A-9"),
                x2 =c("815", "-85", "123", "9876", "-9"),
                x3 =c("8-9", "-898", "129", "-9", NA)
                )

     x1   x2   x3
1    45  815  8-9
2    -8  -85 -898
3   hat  123  129
4 -9876 9876   -9
5   A-9   -9 <NA>
  •  Tags:  
  • r
  • Related