I have a data set that includes a variable where some of the cases have nested vectors, i.e. some of the cases are just a string, while other cases are vectors of length 2. I would like to make a new variable that just includes the second element of the nested vectors for the cases where there is a nested vector, and is NA if there is no nested vector.
A reproducible example is below:
df <- list(id = 1:5,
answer = list("Agree",
c("Agree", "Strongly Agree"),
c("Disagree", "Agree"),
"Disagree",
c("Disagree", "Strongly Disagree")))
I would like to make a new column in my data frame that has the values
NA, "Strongly Agree", "Agree", NA, "Strongly Disagree"
in that order; in other words, the second element of the vectors where there is a vector, and NA if there is not a vector.
I have attempted to use a mutate()
function, as follows:
df %>%
mutate(answer_split = case_when(length(answer) == 2 ~ answer[[??]][2]))
I am unsure what to put in the place of the ??
, or if the mutate()
/case_when()
combination is the correct one.
Any assistance—either guidance on how to complete the above code, or pointing me in the right direction towards a different method—would be greatly appreciated!
CodePudding user response:
Basic solution with tidyverse
:
df |>
as_tibble() |>
unnest_wider(answer)
This will leave you with some ugly column names. One other approach that is more verbose but avoids the name warnings is:
df |>
as_tibble() |>
mutate(answer = map_chr(answer, ~paste(., collapse=";"))) |>
separate(answer, into = c("a","b"), sep = ";")
CodePudding user response:
You could use map
from purrr
as an extractor function. The description of its argument .f
:
If character vector, numeric vector, or list, it is converted to an extractor function. Character vectors index by name and numeric vectors index by position; use a list to index by position and name at different levels. If a component is not present, the value of
.default
will be returned.
library(dplyr)
library(purrr)
as_tibble(df) %>%
mutate(second_ans = map_chr(answer, 2, .default = NA))
# # A tibble: 5 × 3
# id answer second_ans
# <int> <list> <chr>
# 1 1 <chr [1]> NA
# 2 2 <chr [2]> Strongly Agree
# 3 3 <chr [2]> Agree
# 4 4 <chr [1]> NA
# 5 5 <chr [2]> Strongly Disagree
You could also use sapply
from base
:
as_tibble(df) %>%
mutate(second_ans = sapply(answer, `[`, 2))