I have a data frame with more than 1000 character columns. I want to recode certain character columns as numeric but keep other columns as characters. I hope to achieve this without manually including/excluding columns for recoding (aka the operation needs to be on the entire data frame). I tried using dplyr::recode and dplyr::case_when but the unrecoded columns were all coerced into NA or I had to convert all columns into numeric.
A toy dataset:
df = tibble(name = c('Tom', 'Sarah'), wealth = c('poor', 'wealthy'), health = c('poor', 'good'))
I think dplyr::recode and dplyr::case require all columns to be the same type after recoding, yet I don't want the 'name' column to be numeric.
I tried using:
df_recoded = df %>% mutate(across(everything(),~recode(
.,
'poor' = 1,
'good' = 2,
'wealthy' = 2,
.default = . #doesn't work
)))
Or
df_recoded = df %>% mutate(across(everything(),~case_when(
. == 'poor' ~ 1,
. == 'good' ~ 2,
. == 'wealthy' ~ 2,
TRUE ~ . # doesn't work.
)))
The desired output is:
#> # A tibble: 2 x 3
#> name wealth health
#> <chr> <dbl> <dbl>
#> 1 Tom 1 1
#> 2 Sarah 2 2
CodePudding user response:
Issue is that 1
and 2
needs to be characters when you do this operation.
library(tidyverse)
df = tibble(name = c('Tom', 'Sarah'),
wealth = c('poor', 'wealthy'),
health = c('poor', 'good'))
df %>%
mutate(across(everything(),
~ case_when(.x == "poor" ~ "1",
.x %in% c("good", "wealthy") ~ "2",
TRUE ~ .x)))
#> # A tibble: 2 x 3
#> name wealth health
#> <chr> <chr> <chr>
#> 1 Tom 1 1
#> 2 Sarah 2 2
Created on 2022-06-29 by the reprex package (v2.0.1)
CodePudding user response:
Try this
library(tidyverse)
df = tibble(name = c('Tom', 'Sarah'),
wealth = c('poor', 'wealthy'),
health = c('poor', 'good'))
df %>%
mutate(across(everything(),
~ case_when(.x == "poor" ~ "1",
.x %in% c("good", "wealthy") ~ "2",
TRUE ~ .x))) -> df_recoded
df_recoded[] <- lapply(names(df_recoded) ,
\(x) if(x != "name") as.numeric(df_recoded[[x]]) else df_recoded[[x]])
you can use base R
to simplify the solution
l <- lapply(df , \(x) ifelse(x == "poor" , 1 , ifelse(x %in% c("good", "wealthy") , 2 , x)))
l$wealth <- as.numeric(l$wealth)
l$health <- as.numeric(l$health)
as_tibble(l)
- output
df_recoded
# A tibble: 2 × 3
name wealth health
<chr> <dbl> <dbl>
1 Tom 1 1
2 Sarah 2 2