Home > Software design >  Splitting a String with Delimiters into Dummy Variables
Splitting a String with Delimiters into Dummy Variables

Time:06-16

So the goal is to take a table like this:

Person Answer
Matt A;B;C;
Sandy B;D;
Charles A;C;D;

And transform it to this:

Person A B C D
Matt 1 1 1 0
Sandy 0 1 0 1
Charles 1 0 1 1

I feel there is likely an appropriate way to handle this in tidyverse but none of the pivot, unnest, or mutate methods seem ideal in a way that doesn't involve a lot of messy processing.

How should I go about solving this?

CodePudding user response:

The trick is to use tidyr::separate_rows() to move your data to a longer format.

Once all your answers are extracted, it is easy to pivot it back to a wide format with tidyr::pivot_wider()

library(tidyverse)
d <- tibble::tribble(
  ~Person,  ~Answer,
  "Matt", "A;B;C;",
  "Sandy",   "B;D;",
  "Charles", "A;C;D;"
)

d |> 
  tidyr::separate_rows(Answer, sep = ";") |> 
  filter(Answer != "") |> 
  mutate(value = 1) |> 
  pivot_wider(
    names_from = Answer,
    values_from = value,
    values_fill = 0
  )
#> # A tibble: 3 x 5
#>   Person      A     B     C     D
#>   <chr>   <dbl> <dbl> <dbl> <dbl>
#> 1 Matt        1     1     1     0
#> 2 Sandy       0     1     0     1
#> 3 Charles     1     0     1     1

Created on 2022-06-15 by the reprex package (v2.0.1)

  • Related