I have data frame with multiple columns and rows. I want to compare column number 7 rows to the header of columns 1,2,4 and 5 and if it matches then print the sequence present in that column as new column. The common pattern between the columns and rows is .x
and .y
My data frame looks like this
F_20TP53_Seq.x F_30TP53_Seq.x R_20TP53_Seq.x F_20TP53_Seq.y F_30TP53_Seq.y R_20TP53_Seq.y Name_of_F_TP53
CACTGT CAAAGT CATAGT AATGTTG CACAGT CAAAGT F_20TP53_Max_score.y
CACAGT CACTGT CACAGT CCAAGG CATAGT CACTGT F_30TP53_Max_score.y
CATAGT AATGTTG CACAG GCCAGG CACAGT CACTGT F_20TP53_Max_score.x
CACAGT CCAAGG CACCAT CAAAGT CACAG CACAGT F_30TP53_Max_score.x
CACTGT CACAGT CCAAGG CACTGT CACCAT CATAGT F_30TP53_Max_score.y
And my expected output is like this
F_20TP53_Seq.x F_30TP53_Seq.x R_20TP53_Seq.x F_20TP53_Seq.y F_30TP53_Seq.y R_20TP53_Seq.y Name_of_F_TP53 F_20TP53_Seq.x F_30TP53_Seq.x F_20TP53_Seq.y F_30TP53_Seq.y
CACTGT CAAAGT CATAGT AATGTTG CACAGT CAAAGT F_20TP53_Max_score.y NA NA AATGTTG CACAGT
CACAGT CACTGT CACAGT CCAAGG CATAGT CACTGT F_30TP53_Max_score.y NA NA CCAAGG CATAGT
CATAGT AATGTTG CACAG GCCAGG CACAGT CACTGT F_20TP53_Max_score.x CATAGT AATGTTG NA NA
CACAGT CCAAGG CACCAT CAAAGT CACAG CACAGT F_30TP53_Max_score.x CACAGT CCAAGG NA NA
CACTGT CACAGT CCAAGG CACTGT CACCAT CATAGT F_30TP53_Max_score.y NA NA CACTGT CACCAT
CodePudding user response:
I use stringr
package below to extract a logical vector as to whether or not there is a match in the target column
library(stringr)
cbind(
d,
setNames(
lapply(c(1,2,4,5), function(x) {
key = paste0(str_extract(colnames(d)[x],"x|y"),"$")
k <- str_detect(d$Name_of_F_TP53,key)
sapply(seq_along(k),function(l) ifelse(k[l],d[l,x],NA))
}), colnames(d)[c(1,2,4,5)])
)
Output:
F_20TP53_Seq.x F_30TP53_Seq.x R_20TP53_Seq.x F_20TP53_Seq.y F_30TP53_Seq.y R_20TP53_Seq.y Name_of_F_TP53
1 CACTGT CAAAGT CATAGT AATGTTG CACAGT CAAAGT F_20TP53_Max_score.y
2 CACAGT CACTGT CACAGT CCAAGG CATAGT CACTGT F_30TP53_Max_score.y
3 CATAGT AATGTTG CACAG GCCAGG CACAGT CACTGT F_20TP53_Max_score.x
4 CACAGT CCAAGG CACCAT CAAAGT CACAG CACAGT F_30TP53_Max_score.x
5 CACTGT CACAGT CCAAGG CACTGT CACCAT CATAGT F_30TP53_Max_score.y
F_20TP53_Seq.x F_30TP53_Seq.x F_20TP53_Seq.y F_30TP53_Seq.y
1 <NA> <NA> AATGTTG CACAGT
2 <NA> <NA> CCAAGG CATAGT
3 CATAGT AATGTTG <NA> <NA>
4 CACAGT CCAAGG <NA> <NA>
5 <NA> <NA> CACTGT CACCAT