Home > OS >  How to separate a vector within a variable into two variables in the tidyverse in R?
How to separate a vector within a variable into two variables in the tidyverse in R?

Time:08-25

I have a data set that includes a variable where some of the cases have nested vectors, i.e. some of the cases are just a string, while other cases are vectors of length 2. I would like to make a new variable that just includes the second element of the nested vectors for the cases where there is a nested vector, and is NA if there is no nested vector.

A reproducible example is below:

df <- list(id = 1:5,
                 answer = list("Agree",
                               c("Agree", "Strongly Agree"),
                               c("Disagree", "Agree"),
                               "Disagree",
                               c("Disagree", "Strongly Disagree")))

I would like to make a new column in my data frame that has the values

NA, "Strongly Agree", "Agree", NA, "Strongly Disagree"

in that order; in other words, the second element of the vectors where there is a vector, and NA if there is not a vector.

I have attempted to use a mutate() function, as follows:

df %>%
  mutate(answer_split = case_when(length(answer) == 2 ~ answer[[??]][2]))

I am unsure what to put in the place of the ??, or if the mutate()/case_when() combination is the correct one.

Any assistance—either guidance on how to complete the above code, or pointing me in the right direction towards a different method—would be greatly appreciated!

CodePudding user response:

Basic solution with tidyverse:

df |> 
  as_tibble() |> 
  unnest_wider(answer)

This will leave you with some ugly column names. One other approach that is more verbose but avoids the name warnings is:

df |> 
  as_tibble() |> 
  mutate(answer = map_chr(answer, ~paste(., collapse=";"))) |> 
  separate(answer, into = c("a","b"), sep = ";")

CodePudding user response:

You could use map from purrr as an extractor function. The description of its argument .f:

If character vector, numeric vector, or list, it is converted to an extractor function. Character vectors index by name and numeric vectors index by position; use a list to index by position and name at different levels. If a component is not present, the value of .default will be returned.

library(dplyr)
library(purrr)

as_tibble(df) %>%
  mutate(second_ans = map_chr(answer, 2, .default = NA))

# # A tibble: 5 × 3
#      id answer    second_ans
#   <int> <list>    <chr>
# 1     1 <chr [1]> NA
# 2     2 <chr [2]> Strongly Agree
# 3     3 <chr [2]> Agree
# 4     4 <chr [1]> NA
# 5     5 <chr [2]> Strongly Disagree

You could also use sapply from base:

as_tibble(df) %>%
  mutate(second_ans = sapply(answer, `[`, 2))
  • Related