Home > OS >  How can I transform multiple choice character columns to numeric indicator columns using tidyverse?
How can I transform multiple choice character columns to numeric indicator columns using tidyverse?

Time:08-10

Some types of survey software handle "choose all that apply" questions in the following inconvenient way. Suppose a question asked "What type of pet(s) do you own? Choose all that apply: dog, cat, ferret, snake." The resulting dataset looks like this:

pet_tab <- tibble(
  owner = 1:5,
  pet_1 = c("dog", "cat", "ferret", "dog", "snake"),
  pet_2 = c("cat", "ferret", NA, "ferret", NA),
  pet_3 = c("ferret", NA, NA, "snake", NA),
  pet_4 = c("snake", NA, NA, NA, NA)
  )

  owner pet_1  pet_2  pet_3  pet_4
   1    dog    cat    ferret snake
   2    cat    ferret NA     NA   
   3    ferret NA     NA     NA   
   4    dog    ferret snake  NA   
   5    snake  NA     NA     NA  

This is hard to work with. A far better way to organize this data would be like this:

  owner  dog   cat ferret snake
     1     1     1      1     1
     2     0     1      1     0
     3     0     0      1     0
     4     1     0      1     1
     5     0     0      0     1

where each column indicates whether or not an owner has a given type of animal. How can I transform the first type of data into the second type? I realize there are a lot of ways to do this, but I'd like something elegant, concise, and preferably using tidyverse, though data.table will suffice as well.

CodePudding user response:

We can reshape to 'long' format with pivot_longer on the 'pet' columns and then to wide with pivot_wider with values_fn as length and values_fill as 0

library(dplyr)
library(tidyr)
pet_tab  %>%
   pivot_longer(cols = starts_with('pet'), values_drop_na = TRUE) %>% 
   pivot_wider(names_from = value, values_from = name,
      values_fn = ~  (length(.x) > 0), values_fill = 0)

-output

# A tibble: 5 × 5
  owner   dog   cat ferret snake
  <int> <int> <int>  <int> <int>
1     1     1     1      1     1
2     2     0     1      1     0
3     3     0     0      1     0
4     4     1     0      1     1
5     5     0     0      0     1

Or in base R with table

 (table(pet_tab$owner[row(pet_tab[-1])], unlist(pet_tab[-1])) > 0)
   
    cat dog ferret snake
  1   1   1      1     1
  2   1   0      1     0
  3   0   0      1     0
  4   0   1      1     1
  5   0   0      0     1
  • Related