Home > Mobile >  R, mapping items in a data frame
R, mapping items in a data frame

Time:11-29

Total newb here. Please explain how on Earth does this line work, I understand the rest:

 gene_symbol <- id2symbol$gene_symbol[id2symbol$Ensembl == gene_id]

How does the ==, which as I know equals TRUE, work in this case? Or does it mean something else here? Thank you ever so much!

cancer_genes <- c("ENSG00000139618", "ENSG00000106462", "ENSG00000116288")

id2symbol <- data.frame(
  "Ensembl" = c("ENSG00000141510", "ENSG00000139618", "ENSG00000106462", "ENSG00000116288"),
  "gene_symbol" = c("TP53", "BRCA2", "EZH2", "PARK7")
)

gene_id_converter <- function(gene_id) {
  gene_symbol <- id2symbol$gene_symbol[id2symbol$Ensembl == gene_id]
  return(gene_symbol)
}

gene_id_converter(gene_id="ENSG00000141510")

CodePudding user response:

With the function, we can either Vectorize or loop over the elements to get the value

sapply(cancer_genes, gene_id_converter)

-output

ENSG00000139618 ENSG00000106462 ENSG00000116288 
        "BRCA2"          "EZH2"         "PARK7" 

== is elementwise operator i.e. it should either have the lhs and rhs to be of same length or the rhs can be of length 1 which gets recycled. The output of == is a logical TRUE/FALSE which is used for subsetting the corresponding value from id2symbol$gene_symbol.

Thus, if we provide more than one element to the function, there will be a length difference and it can get unexpected results due to recycling

> id2symbol$Ensembl == cancer_genes[1]
[1] FALSE  TRUE FALSE FALSE
> id2symbol$Ensembl == cancer_genes
[1] FALSE FALSE FALSE FALSE
Warning message:
In id2symbol$Ensembl == cancer_genes :
  longer object length is not a multiple of shorter object length

Thus by looping over the cancer_genes, it would use the single element to recycle and gives back a logical TRUE/FALSE and get the corresponding id2symbol$gene_symbol where there are TRUE elements

  • Related