I have some data from a a Google Forms and I'd like to slipt the common-separated answers and duplicate the participant's ID
- The data looks like this:
> head(data)
names Q2 Q3 Q4
1 PART_1 fruits bananas, apples brocolli, lettuce, potatoes
2 PART_2 vegetables bananas, oranges brocolli
3 PART_3 fruits carrots, brocolli, lettuce
- Desired output #1 (filling in with Nas):
names Q2 Q3 Q4
1 PART_1 fruits bananas brocolli
PART_1 NA apples lettuce,
PART_1 NA NA potatoes
so on...
- Desired output #2 (repeat the non-multiple choice answers (as Q1):
names Q2 Q3 Q4
1 PART_1 fruits bananas brocolli
PART_1 fruits apples lettuce,
PART_1 fruits NA potatoes
so on...
- If it's possible, a
tidyverse
solution would be much appreciated!
Obs: The ideia is pretty much like this SQL question. I've seen this R question, but I'd like to repeat the participant's name, not rename them
- data:
structure(list(names = c("PART_1", "PART_2", "PART_3"), Q2 = c("fruits",
"vegetables", "fruits"), Q3 = c("bananas, apples", "bananas, oranges",
""), Q4 = c("brocolli, lettuce, potatoes", "brocolli", "carrots, brocolli, lettuce"
)), class = "data.frame", row.names = c(NA, -3L))
CodePudding user response:
You can do:
library(tidyr)
library(dplyr)
dat %>%
pivot_longer(-c(Q2, names)) %>%
separate_rows(value) %>%
group_by(names, name) %>%
mutate(row = row_number()) %>%
pivot_wider() %>%
select(-row)
# A tibble: 8 × 4
# Groups: names [3]
names Q2 Q3 Q4
<chr> <chr> <chr> <chr>
1 PART_1 fruits "bananas" brocolli
2 PART_1 fruits "apples" lettuce
3 PART_1 fruits NA potatoes
4 PART_2 vegetables "bananas" brocolli
5 PART_2 vegetables "oranges" NA
6 PART_3 fruits "" carrots
7 PART_3 fruits NA brocolli
8 PART_3 fruits NA lettuce
CodePudding user response:
This is currently not directly possible with separate_rows
, which is otherwise the tidyverse
way to go, but you can use cSplit
from splitstackshape
:
library(dplyr)
library(splitstackshape)
cSplit(data, c("Q3", "Q4"), sep = ", ", "long") %>%
filter(if_any(c("Q3", "Q4"), complete.cases))
# names Q2 Q3 Q4
# 1 PART_1 fruits bananas brocolli
# 2 PART_1 fruits apples lettuce
# 3 PART_1 fruits <NA> potatoes
# 4 PART_2 vegetables bananas brocolli
# 5 PART_2 vegetables oranges <NA>
# 6 PART_3 fruits <NA> carrots
# 7 PART_3 fruits <NA> brocolli
# 8 PART_3 fruits <NA> lettuce
CodePudding user response:
Using base R
lst1 <- lapply(data[3:4], strsplit, split = ",\\s ")
mx <- do.call(pmax, lapply(lst1, lengths))
lst2 <- lapply(lst1, \(x) unlist(Map(`length<-`, x, mx)))
out <- cbind(data[rep(seq_len(nrow(data)), mx), 1:2], lst2)
row.names(out) <- NULL
-output
> out
names Q2 Q3 Q4
1 PART_1 fruits bananas brocolli
2 PART_1 fruits apples lettuce
3 PART_1 fruits <NA> potatoes
4 PART_2 vegetables bananas brocolli
5 PART_2 vegetables oranges <NA>
6 PART_3 fruits <NA> carrots
7 PART_3 fruits <NA> brocolli
8 PART_3 fruits <NA> lettuce