Home > Enterprise >  R - if column value matches vector item, take value from second vector
R - if column value matches vector item, take value from second vector

Time:07-20

I have the following table:

library( tidyverse )
data = read.table(text="gene1
           gene2
           gene3", , sep="\t", col.names = c("Protein"))

And the following two lists:

genes = c("gene1", "gene3")
genes_names = c("name1", "name3")

Each item in gene_names corresponds to each item in genes with the same index.

Now, I want to make a new column in data called ToLabel, that holds the item in gene_names if the column value in data$Protein matches genes.

data %>% mutate( ToLabel = ifelse( Protein %in% genes, genes_names, "no" ) )

This does not work as expected. My expected outcome:

Protein ToLabel
gene1   name1
gene2   no
gene3   name3

CodePudding user response:

Use recode:

data %>%
  mutate(Protein = str_squish(Protein),
    ToLabel = recode(Protein, !!!set_names(genes_names, genes), .default = 'no'))

  Protein ToLabel
1   gene1   name1
2   gene2      no
3   gene3   name3

CodePudding user response:

Use a join if we want to replace multiple values by matching

library(dplyr)
data %>%
   mutate(Protein = trimws(Protein)) %>% 
   left_join(tibble(Protein = genes, ToLabel = genes_names)) %>%
   mutate(ToLabel = coalesce(ToLabel, "no"))

-output

  Protein ToLabel
1   gene1   name1
2   gene2      no
3   gene3   name3

CodePudding user response:

You can use use your code with some modifications

library( tidyverse )

data |> rowwise() |> mutate(Protein = trimws(c_across()) ,
ToLabel = ifelse( c_across() %in% genes, genes_names[which(c_across() == genes)],
"no" ) ) |> ungroup()

  • output
# A tibble: 3 × 2
  Protein ToLabel
  <chr>   <chr>  
1 gene1   name1  
2 gene2   no     
3 gene3   name3  

CodePudding user response:

A base R option using merge replace

transform(
  merge(
    transform(data, Protein = trimws(Protein)),
    data.frame(
      genes = c("gene1", "gene3"),
      genes_names = c("name1", "name3")
    ),
    by.x = "Protein",
    by.y = "genes",
    all.x = TRUE
  ),
  genes_names = replace(genes_names, is.na(genes_names), "no")
)

gives

  Protein genes_names
1   gene1       name1
2   gene2          no
3   gene3       name3

CodePudding user response:

You can use match():

ToLabel <- genes_names[match(trimws(data$Protein), genes)]
ToLabel[is.na(ToLabel)] <- "no"

data$ToLabel <- ToLabel
data
#>            Protein ToLabel
#> 1            gene1   name1
#> 2            gene2      no
#> 3            gene3   name3
  • Related