I have 4 columns in a dataframe:
dd <- data.frame(
A= c(102,101,100,107,55),
B= c(102,101,30,77,55),
C= c("cycle", "walk", "walk", "walk", "drive"),
D= c("English", "French", "English", "English", "Japanese")
)
Using R, I wish to compare if entries in columns A and B matches. If entries in A and B do not match, then I wish to replace their corresponding entries in column C and D with 'NA'.
desired <- data.frame(
A= c(102,101,100,107, 55),
B= c(102,101,30,77, 55),
C= c("cycle", "walk", NA, NA, "drive"),
D= c("English", "French", NA, NA, "Japanese")
)
Can anyone advise? The qn is not new but the proposed solutions are usually in Excel. I hope to do this in R. Thank you in advance!
CodePudding user response:
Here is a dplyr
approach:
Across column C and column D, if A is not equal to B, set NA
, if they are equal, use their original values.
UPDATE: since your column A and B are factors of different levels, you might want to change them into character first, do the operation, then change them back to factor.
library(dplyr)
dd %>% mutate(across(A:B, ~as.character(.x)),
across(C:D, ~ ifelse(A != B, NA, as.character(.x))),
across(A:B, ~as.factor(.x)))
A B C D
1 102 102 cycle English
2 101 101 walk French
3 100 30 <NA> <NA>
4 107 77 <NA> <NA>
5 55 55 drive Japanese
CodePudding user response:
A simple option
df[df$A!=df$B,c("C","D")]=NA
if A and B are factors you could try
df[as.character(df$A)!=as.character(df$B),c("C","D")]=NA