Home > Back-end >  How to create a variable to a dataset conditioning on missing values and another dataframe at the sa
How to create a variable to a dataset conditioning on missing values and another dataframe at the sa

Time:10-25

I have these two dataframes (imagine them very big) :

df = data.frame(subjects = 1:10,
                var1 = c('a',NA,'b',NA,'c',NA,'d','e','f','g'))

g = data.frame(subjects = c(1,3,5,7,8,9,10),
               score = c(1,2,1,3,2,4,1) )

and I want to put the variable score from the g dataframe into the df dataframe, with the condition that if var1 = NA, then the score in df will be equal to NA. How can we make that with a simple function ? thanks.

Second scenario :

df = data.frame(subjects = 1:10,
                var1 = c('a','e','b','c','c','b','d','e','f','g'))

g = data.frame(subjects = c(1,3,5,7,8,9,10),
               score = c(1,2,1,3,2,4,1) )

now I want that the score for each subject that was not calculated to be NAs to become as follows :

df = data.frame(subjects = 1:10,
                var1 = c('a','e','b','c','c','b','d','e','f','g'),
                score = c(1,NA,2,NA,1,NA,3,2,4,1))


CodePudding user response:

We could do a join by 'subjects' which return 'score' with NA where there are no corresponding 'subject's in 'g'. If we need the 'score' to be NA also when 'var1' is NA, do a replace on the next step with NA check on 'var1'

library(dplyr)
df <- left_join(df, g, by= "subjects") %>% 
    mutate(score = replace(score, is.na(var1), NA))

-output

df
subjects var1 score
1         1    a     1
2         2    e    NA
3         3    b     2
4         4    c    NA
5         5    c     1
6         6    b    NA
7         7    d     3
8         8    e     2
9         9    f     4
10       10    g     1
  • Related