I have these two dataframes (imagine them very big) :
df = data.frame(subjects = 1:10,
var1 = c('a',NA,'b',NA,'c',NA,'d','e','f','g'))
g = data.frame(subjects = c(1,3,5,7,8,9,10),
score = c(1,2,1,3,2,4,1) )
and I want to put the variable score
from the g dataframe into the df dataframe, with the condition that if var1 = NA, then the score in df will be equal to NA. How can we make that with a simple function ? thanks.
Second scenario :
df = data.frame(subjects = 1:10,
var1 = c('a','e','b','c','c','b','d','e','f','g'))
g = data.frame(subjects = c(1,3,5,7,8,9,10),
score = c(1,2,1,3,2,4,1) )
now I want that the score for each subject that was not calculated to be NAs to become as follows :
df = data.frame(subjects = 1:10,
var1 = c('a','e','b','c','c','b','d','e','f','g'),
score = c(1,NA,2,NA,1,NA,3,2,4,1))
CodePudding user response:
We could do a join by
'subjects' which return 'score' with NA
where there are no corresponding 'subject's in 'g'. If we need the 'score' to be NA
also when 'var1' is NA
, do a replace
on the next step with NA check on 'var1'
library(dplyr)
df <- left_join(df, g, by= "subjects") %>%
mutate(score = replace(score, is.na(var1), NA))
-output
df
subjects var1 score
1 1 a 1
2 2 e NA
3 3 b 2
4 4 c NA
5 5 c 1
6 6 b NA
7 7 d 3
8 8 e 2
9 9 f 4
10 10 g 1