I have a df:
df1 <- data.frame(
sample_id = c('SB024', 'SB026', 'SB027', 'SB026', 'SB027'),
VAR = c('10g.76789732G>A','10g.76789732G>A','10g.76789732G>A','11g.102195430G>A','11g.102195430G>A'),
REF = c('G','G','G','C','C'),
ALT = c('A','A','A','T','T'))
df1
sample_id VAR REF ALT
1 SB024 10g.76789732G>A G A
2 SB026 10g.76789732G>A G A
3 SB027 10g.76789732G>A G A
4 SB026 11g.102195430G>A C T
5 SB027 11g.102195430G>A C T
I have made a second dataframe with the unique entries of VAR column:
library(dplyr)
VAR_to_check<-as.data.frame(unique(df$VAR))
names(VAR_to_check)<-"VAR"
ref<-unique(left_join(VAR_to_check,df1[,c(2:4)],by="VAR"))
ref
VAR REF ALT
1 10g.76789732G>A G A
4 11g.102195430G>A C T
I would now like to add columns to the 'ref' dataframe for each sample_id like so:
VAR REF ALT SB024 SB026 SB027
1 10g.76789732G>A G A NA NA NA
4 11g.102195430G>A C T NA NA NA
Then for each VAR in 'ref', if the sample_id has an entry in 'df1' then fill with ALT and if not fill with REF. So for this data it would be:
VAR REF ALT SB024 SB026 SB027
1 10g.76789732G>A G A A A A
4 11g.102195430G>A C T C T T
I'm very stuck on this and have previously posted a question Ifelse to Compare the content of columns with dplyr but this wasn't quite right as I need to get the unique VAR entries
CodePudding user response:
Here is one option -
library(dplyr)
result <- df1 %>% distinct(VAR, REF, ALT)
values <- unique(df1$sample_id)
result[values] <- do.call(rbind, Map(function(x, y, z)
ifelse(values %in% df1$sample_id[df1$VAR == z], x, y),
result$ALT, result$REF, result$VAR))
result
# VAR REF ALT SB024 SB026 SB027
#1 10g.76789732G>A G A A A A
#2 11g.102195430G>A C T C T T
Inside Map
for each VAR
we return ALT
if the values
is present in sample_id
or else return REF
value. This will return us a list of vectors which are combined together as a dataframe with do.call(rbind, ..)
.