Total newb here. Please explain how on Earth does this line work, I understand the rest:
gene_symbol <- id2symbol$gene_symbol[id2symbol$Ensembl == gene_id]
How does the ==
, which as I know equals TRUE, work in this case? Or does it mean something else here? Thank you ever so much!
cancer_genes <- c("ENSG00000139618", "ENSG00000106462", "ENSG00000116288")
id2symbol <- data.frame(
"Ensembl" = c("ENSG00000141510", "ENSG00000139618", "ENSG00000106462", "ENSG00000116288"),
"gene_symbol" = c("TP53", "BRCA2", "EZH2", "PARK7")
)
gene_id_converter <- function(gene_id) {
gene_symbol <- id2symbol$gene_symbol[id2symbol$Ensembl == gene_id]
return(gene_symbol)
}
gene_id_converter(gene_id="ENSG00000141510")
CodePudding user response:
With the function, we can either Vectorize
or loop over the elements to get the value
sapply(cancer_genes, gene_id_converter)
-output
ENSG00000139618 ENSG00000106462 ENSG00000116288
"BRCA2" "EZH2" "PARK7"
==
is elementwise operator i.e. it should either have the lhs and rhs to be of same length or the rhs
can be of length 1 which gets recycled. The output of ==
is a logical TRUE/FALSE which is used for subsetting the corresponding value from id2symbol$gene_symbol
.
Thus, if we provide more than one element to the function, there will be a length difference and it can get unexpected results due to recycling
> id2symbol$Ensembl == cancer_genes[1]
[1] FALSE TRUE FALSE FALSE
> id2symbol$Ensembl == cancer_genes
[1] FALSE FALSE FALSE FALSE
Warning message:
In id2symbol$Ensembl == cancer_genes :
longer object length is not a multiple of shorter object length
Thus by looping over the cancer_genes
, it would use the single element to recycle and gives back a logical TRUE/FALSE and get the corresponding id2symbol$gene_symbol
where there are TRUE
elements