Some types of survey software handle "choose all that apply" questions in the following inconvenient way. Suppose a question asked "What type of pet(s) do you own? Choose all that apply: dog, cat, ferret, snake." The resulting dataset looks like this:
pet_tab <- tibble(
owner = 1:5,
pet_1 = c("dog", "cat", "ferret", "dog", "snake"),
pet_2 = c("cat", "ferret", NA, "ferret", NA),
pet_3 = c("ferret", NA, NA, "snake", NA),
pet_4 = c("snake", NA, NA, NA, NA)
)
owner pet_1 pet_2 pet_3 pet_4
1 dog cat ferret snake
2 cat ferret NA NA
3 ferret NA NA NA
4 dog ferret snake NA
5 snake NA NA NA
This is hard to work with. A far better way to organize this data would be like this:
owner dog cat ferret snake
1 1 1 1 1
2 0 1 1 0
3 0 0 1 0
4 1 0 1 1
5 0 0 0 1
where each column indicates whether or not an owner has a given type of animal. How can I transform the first type of data into the second type? I realize there are a lot of ways to do this, but I'd like something elegant, concise, and preferably using tidyverse, though data.table will suffice as well.
CodePudding user response:
We can reshape to 'long' format with pivot_longer
on the 'pet' columns and then to wide with pivot_wider
with values_fn
as length
and values_fill
as 0
library(dplyr)
library(tidyr)
pet_tab %>%
pivot_longer(cols = starts_with('pet'), values_drop_na = TRUE) %>%
pivot_wider(names_from = value, values_from = name,
values_fn = ~ (length(.x) > 0), values_fill = 0)
-output
# A tibble: 5 × 5
owner dog cat ferret snake
<int> <int> <int> <int> <int>
1 1 1 1 1 1
2 2 0 1 1 0
3 3 0 0 1 0
4 4 1 0 1 1
5 5 0 0 0 1
Or in base R
with table
(table(pet_tab$owner[row(pet_tab[-1])], unlist(pet_tab[-1])) > 0)
cat dog ferret snake
1 1 1 1 1
2 1 0 1 0
3 0 0 1 0
4 0 1 1 1
5 0 0 0 1