I am working with the R programming language.
I have the following dataset:
file = data.frame(id = c(1,2,3,4,5), col1 = c("Red", "Blue", "CCC", "Yellow", "Orange"), col2 = c("AAA", "BBB", "CCC", "DDD", "Red"))
id col1 col2
1 1 Red AAA
2 2 Blue BBB
3 3 CCC CCC
4 4 Yellow DDD
5 5 Orange Red
For all cells that contain %LIKE% "CCC" or %LIKE% "Red", I would like to replace them with NA. The end result should look something like this:
id col1 col2
1 1 NA AAA
2 2 Blue BBB
3 3 NA NA
4 4 Yellow DDD
5 5 Orange NA
I found a similar post (Replace entire expression that contains a specific string) and tried to apply the logic presented there to my question:
step1 = file[grep("CCC", file)] <- "NA"
step2 = step1[grep("Red", step1)] <- "NA"
However, I don't think this is working - all I get is an "NA" output.
Can someone please show me how to fix this problem?
Thanks!
CodePudding user response:
I would use ifelse
along with %in%
:
file$col1 <- ifelse(file$col1 %in% c("CCC", "Red"), NA, file$col1)
file$col2 <- ifelse(file$col2 %in% c("CCC", "Red"), NA, file$col2)
For a substring match, use grepl
:
<!-- language: r -->
file$col1 <- ifelse(grepl("CCC|Red", file$col1), NA, file$col1)