Home > database >  Recode only certain values keep others as it is while preserving multiple data types in different co
Recode only certain values keep others as it is while preserving multiple data types in different co

Time:06-30

I have a data frame with more than 1000 character columns. I want to recode certain character columns as numeric but keep other columns as characters. I hope to achieve this without manually including/excluding columns for recoding (aka the operation needs to be on the entire data frame). I tried using dplyr::recode and dplyr::case_when but the unrecoded columns were all coerced into NA or I had to convert all columns into numeric.

A toy dataset:

 df = tibble(name = c('Tom', 'Sarah'), wealth = c('poor', 'wealthy'), health = c('poor', 'good')) 

I think dplyr::recode and dplyr::case require all columns to be the same type after recoding, yet I don't want the 'name' column to be numeric.

I tried using:

df_recoded = df %>% mutate(across(everything(),~recode(
  ., 
  'poor' = 1, 
  'good' = 2,
  'wealthy' = 2,
  .default = .    #doesn't work
)))

Or

 df_recoded = df %>% mutate(across(everything(),~case_when(
      . == 'poor' ~ 1, 
      . == 'good' ~ 2,
      . == 'wealthy' ~ 2, 
      TRUE ~ .    # doesn't work. 
    )))

The desired output is:

#> # A tibble: 2 x 3
#>   name  wealth health
#>   <chr> <dbl>  <dbl> 
#> 1 Tom   1      1     
#> 2 Sarah 2      2

CodePudding user response:

Issue is that 1 and 2 needs to be characters when you do this operation.

library(tidyverse)

df = tibble(name = c('Tom', 'Sarah'), 
            wealth = c('poor', 'wealthy'), 
            health = c('poor', 'good')) 

df %>%  
  mutate(across(everything(), 
                ~ case_when(.x == "poor" ~ "1", 
                                 .x %in% c("good", "wealthy") ~ "2",
                                 TRUE ~ .x)))
#> # A tibble: 2 x 3
#>   name  wealth health
#>   <chr> <chr>  <chr> 
#> 1 Tom   1      1     
#> 2 Sarah 2      2

Created on 2022-06-29 by the reprex package (v2.0.1)

CodePudding user response:

Try this

library(tidyverse)

df = tibble(name = c('Tom', 'Sarah'), 
            wealth = c('poor', 'wealthy'), 
            health = c('poor', 'good')) 

df %>%  
    mutate(across(everything(), 
                  ~ case_when(.x == "poor" ~ "1", 
                              .x %in% c("good", "wealthy") ~ "2",
                              TRUE ~ .x))) -> df_recoded

df_recoded[] <- lapply(names(df_recoded) , 
\(x) if(x != "name") as.numeric(df_recoded[[x]]) else df_recoded[[x]])

you can use base R to simplify the solution

l <- lapply(df , \(x) ifelse(x == "poor" , 1 , ifelse(x %in% c("good", "wealthy") , 2 , x)))

l$wealth <- as.numeric(l$wealth)
l$health <- as.numeric(l$health)

as_tibble(l)

  • output
df_recoded

# A tibble: 2 × 3
  name  wealth health
  <chr>  <dbl>  <dbl>
1 Tom        1      1
2 Sarah      2      2
  •  Tags:  
  • r
  • Related