Home > Software engineering >  Convert df of characters into specific numbers
Convert df of characters into specific numbers

Time:01-16

General:

In a df of characters, convert them into numbers (to be used as a heat map).

Specific:

I collected annotations for different genes and found that they disagree in many cases. Now I would like to visualise this as a heat map. For this, I need to convert the character vectors of the annotations into numbers. I tried a conversation into factors but this gives me no control which char is assigned to which number. As it makes sense to control this, the factor conversion did not deliver the desired results.

Start DF:

df_char <- data.frame(
 id = c('Gene1', 'Gene2', 'Gene3', 'Gene4', 'Gene5'),
 annoA = c('primary', 'secondary', 'tertiary', 'primary', NA),
 annoB = c('primary', 'primary', 'tertiary', 'tertiary', 'tertiary'),
 annoC = c('primary', 'secondary', 'secondary', 'primary', NA)
)

Desired result:

df_num <- data.frame(
 id = c('Gene1', 'Gene2', 'Gene3', 'Gene4', 'Gene5'),
 annoA = c(1, 2, 2, 1, NA),
 annoB = c(1, 1, 3, 3, 3),
 annoC = c(1, 2, 2, 1, NA)
  )

I experimented with a ifelse function, but to no avail:

granule_coverter <- function(df, col) {
 df$col <- ifelse(df$col == 'primary', 1, df$col)
 df$col <- ifelse(df$col == 'secondary', 2, df$col)
 df$col <- ifelse(df$col == 'tertiary', 3, df$col)
 df$col <- ifelse(df$col == 'ficolin-1', 4, df$col)
 df$col <- ifelse(df$col == 'secretory', 5, df$col)
 return(df)
}

CodePudding user response:

You can use match():

library(dplyr)

df_char %>%
  mutate(across(starts_with("anno"),
         ~ match(.x, c('primary', 'secondary', 'tertiary'))))

#      id annoA annoB annoC
# 1 Gene1     1     1     1
# 2 Gene2     2     1     2
# 3 Gene3     3     3     2
# 4 Gene4     1     3     1
# 5 Gene5    NA     3    NA

or dplyr::recode():

df_char %>%
  mutate(across(starts_with("anno"),
         ~ recode(.x, 'primary' = 1L, 'secondary' = 2L, 'tertiary' = 3L)))

CodePudding user response:

There are quite a few ways to handle this task; one potential option is to use case_when() (from the dplyr package) across each column you want to recode, e.g.

library(dplyr)

df_char <- data.frame(
  id = c('Gene1', 'Gene2', 'Gene3', 'Gene4', 'Gene5'),
  annoA = c('primary', 'secondary', 'tertiary', 'primary', NA),
  annoB = c('primary', 'primary', 'tertiary', 'tertiary', 'tertiary'),
  annoC = c('primary', 'secondary', 'secondary', 'primary', NA)
)

df_char %>%
  mutate(across(starts_with("anno"), ~case_when(
    .x == "primary" ~ 1,
    .x == "secondary" ~ 2,
    .x == "tertiary" ~ 3,
    TRUE ~ NA_real_
  )))
#>      id annoA annoB annoC
#> 1 Gene1     1     1     1
#> 2 Gene2     2     1     2
#> 3 Gene3     3     3     2
#> 4 Gene4     1     3     1
#> 5 Gene5    NA     3    NA

Created on 2023-01-16 with reprex v2.0.2


Another potential option is to create a 'lookup table' of key-value pairs and use that to recode() the columns of interest, e.g.

library(dplyr)

df_char <- data.frame(
  id = c('Gene1', 'Gene2', 'Gene3', 'Gene4', 'Gene5'),
  annoA = c('primary', 'secondary', 'tertiary', 'primary', NA),
  annoB = c('primary', 'primary', 'tertiary', 'tertiary', 'tertiary'),
  annoC = c('primary', 'secondary', 'secondary', 'primary', NA)
)

key_value_pairs <- c("primary" = 1, secondary = 2, "tertiary" = 3)
df_char %>%
  mutate(across(starts_with("anno"), ~recode(.x, !!!key_value_pairs)))
#>      id annoA annoB annoC
#> 1 Gene1     1     1     1
#> 2 Gene2     2     1     2
#> 3 Gene3     3     3     2
#> 4 Gene4     1     3     1
#> 5 Gene5    NA     3    NA

Created on 2023-01-16 with reprex v2.0.2

CodePudding user response:

A base option with match

> df_char[-1] <- match(as.matrix(df_char[-1]), c("primary", "secondary", "tertiary", "ficolin-1", "secretory"))

> df_char
     id annoA annoB annoC
1 Gene1     1     1     1
2 Gene2     2     1     2
3 Gene3     3     3     2
4 Gene4     1     3     1
5 Gene5    NA     3    NA
  • Related