Home > Software engineering >  create column one df if exists in other df R
create column one df if exists in other df R

Time:09-24

I have a df:

df1 <- data.frame(
  sample_id = c('SB024', 'SB026', 'SB027', 'SB026', 'SB027'), 
  VAR = c('10g.76789732G>A','10g.76789732G>A','10g.76789732G>A','11g.102195430G>A','11g.102195430G>A'), 
  REF = c('G','G','G','C','C'),
  ALT = c('A','A','A','T','T'))

df1
  sample_id              VAR REF ALT
1     SB024  10g.76789732G>A   G   A
2     SB026  10g.76789732G>A   G   A
3     SB027  10g.76789732G>A   G   A
4     SB026 11g.102195430G>A   C   T
5     SB027 11g.102195430G>A   C   T

I have made a second dataframe with the unique entries of VAR column:

library(dplyr)
VAR_to_check<-as.data.frame(unique(df$VAR))
names(VAR_to_check)<-"VAR"
ref<-unique(left_join(VAR_to_check,df1[,c(2:4)],by="VAR"))

ref
               VAR REF ALT
1  10g.76789732G>A   G   A
4 11g.102195430G>A   C   T

I would now like to add columns to the 'ref' dataframe for each sample_id like so:

               VAR REF ALT SB024 SB026 SB027
1  10g.76789732G>A   G   A    NA    NA    NA
4 11g.102195430G>A   C   T    NA    NA    NA

Then for each VAR in 'ref', if the sample_id has an entry in 'df1' then fill with ALT and if not fill with REF. So for this data it would be:

               VAR REF ALT SB024 SB026 SB027
1  10g.76789732G>A   G   A     A     A     A
4 11g.102195430G>A   C   T     C     T     T

I'm very stuck on this and have previously posted a question Ifelse to Compare the content of columns with dplyr but this wasn't quite right as I need to get the unique VAR entries

CodePudding user response:

Here is one option -

library(dplyr)

result <- df1 %>% distinct(VAR, REF, ALT)

values <- unique(df1$sample_id)

result[values] <- do.call(rbind, Map(function(x, y, z) 
                         ifelse(values %in% df1$sample_id[df1$VAR == z], x, y), 
                         result$ALT, result$REF, result$VAR))
result

#               VAR REF ALT SB024 SB026 SB027
#1  10g.76789732G>A   G   A     A     A     A
#2 11g.102195430G>A   C   T     C     T     T

Inside Map for each VAR we return ALT if the values is present in sample_id or else return REF value. This will return us a list of vectors which are combined together as a dataframe with do.call(rbind, ..).

  •  Tags:  
  • r
  • Related