I want to create ori.same.maf.barcodes
variable to store the strings of ori.maf.barcode
if the substrings before fourth "-" character matches the strings in sub.same.barcodes
.
sub.same.barcodes
"TCGA-BQ-7058-01A" "TCGA-DZ-6131-02A"
"TCGA-UZ-A9PZ-03A" "TCGA-2Z-A9JQ-01A"
"TCGA-BQ-5887-11A" "TCGA-2Z-A9JQ-01A"
ori.maf.barcode
example:
"TCGA-BQ-7058-01A-11D-1963-05" "TCGA-DZ-6131-01A-11D-1963-05"
"TCGA-UZ-A9PZ-01A-11D-A42K-05" "TCGA-2Z-A9JQ-01A-11D-A42K-05"
"TCGA-BQ-5887-11A-01D-1963-05" "TCGA-G7-7502-01A-12D-A43K-06"
Expected output:
ori.same.maf.barcodes
"TCGA-BQ-7058-01A-11D-1963-05"
"TCGA-2Z-A9JQ-01A-11D-A42K-05"
"TCGA-BQ-5887-11A-01D-1963-05"
"TCGA-G7-7502-01A-12D-A43K-06"
Attempt:
ori.same.maf.barcodes <- ori.maf.barcode %in% sub.same.barcodes
But my code returns "FALSE" instead of a character vector.
CodePudding user response:
Please note that with the sample data you have provided it is not possible for the value TCGA-G7-7502-01A-12D-A43K-06
to appear in the output.
library(stringr)
sub.same.barcodes <- c("TCGA-BQ-7058-01A", "TCGA-DZ-6131-02A", "TCGA-UZ-A9PZ-03A",
"TCGA-2Z-A9JQ-01A", "TCGA-BQ-5887-11A", "TCGA-2Z-A9JQ-01A")
ori.maf.barcode <- c("TCGA-BQ-7058-01A-11D-1963-05", "TCGA-DZ-6131-01A-11D-1963-05",
"TCGA-UZ-A9PZ-01A-11D-A42K-05", "TCGA-2Z-A9JQ-01A-11D-A42K-05",
"TCGA-BQ-5887-11A-01D-1963-05", "TCGA-G7-7502-01A-12D-A43K-06")
idx <- which(str_extract_all(ori.maf.barcode, '.{4}-.{2}-.{4}-.{3}') %in% sub.same.barcodes)
ori.same.maf.barcodes <- ori.maf.barcode[ idx ]
print(ori.same.maf.barcodes)
CodePudding user response:
We could use sub
to extract the substring till the fourth -
and then use %in%
on the logical vector to subset
i1 <- sub("^(([^-] -){4}).*", "\\1", ori.maf.barcode) %in%
sub("^(([^-] -){4}).*", "\\1", sub.same.barcodes)
ori.same.maf.barcodes <- ori.maf.barcode[i1]
CodePudding user response:
Your almost there, but your code ori.maf.barcode %in% sub.same.barcodes
creates the logical equation that returns TRUE
and FALSE
, which is what you are seeing. In order to get back the values which equate to TRUE
you need to pass that expression into a subsetting method to get back what you want.
ori.maf.barcode[which(ori.maf.barcode %in% sub.same.barcodes)]
If it is a vector this should return another vector with only those entries which are TRUE
in the logical statement.
And you need to string match to get the entries based on the first part as iod said below:
This is a loop picks them out one at a time and adds them to a new vector
new.barcodes<-c()
for (sub in sub.same.barcodes){
new<- ori.maf.barcode[which(startsWith(ori.maf.barcode, sub))]
new.barcodes<-c(new.barcodes, new)
}
This will iterate through your prefixes and pull out what you want into a new vector