I have a data frame that looks like this:
ID pick1 pick2 pick3
1 NA 21/11/29 21/11/30
2 21/11/28 21/11/29 NA
3 21/11/28 NA 21/11/30
4 NA 21/11/29 21/11/30
Each participant (ID) could pick 2 dates out of 3 options. Now I want to summarize the selected dates to get a tibble like this:
ID date1 date2
1 21/11/29 21/11/30
2 21/11/28 21/11/29
3 21/11/28 21/11/30
4 21/11/29 21/11/30
However, I can't get it working using tidyverse functions only. I have started to use this library but couldn't find a solution for my issue online - please help! :-)
CodePudding user response:
One option is with rowwise
- group by rows, do the sort
with na.last
as TRUE, keep the sorted output in a list
, unnest
to multiple columns, and select
only columns having at least one non-NA elements
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
rowwise %>%
transmute(ID, date = list(sort(c_across(starts_with('pick')),
na.last = TRUE))) %>%
ungroup %>%
unnest_wider(date) %>%
rename_with(~ str_c('date', seq_along(.)), -ID) %>%
select(where(~ any(!is.na(.))))
-output
# A tibble: 4 × 3
ID date1 date2
<int> <chr> <chr>
1 1 21/11/29 21/11/30
2 2 21/11/28 21/11/29
3 3 21/11/28 21/11/30
4 4 21/11/29 21/11/30
or reshape to 'long' format with pivot_longer
remove the NA
s and reshape it back to 'wide' format
library(stringr)
df1 %>%
pivot_longer(cols = -ID, values_drop_na = TRUE) %>%
group_by(ID) %>%
mutate(name = str_c('date', row_number())) %>%
ungroup %>%
pivot_wider(names_from = name, values_from = value)
-output
# A tibble: 4 × 3
ID date1 date2
<int> <chr> <chr>
1 1 21/11/29 21/11/30
2 2 21/11/28 21/11/29
3 3 21/11/28 21/11/30
4 4 21/11/29 21/11/30
data
df1 <- structure(list(ID = 1:4, pick1 = c(NA, "21/11/28", "21/11/28",
NA), pick2 = c("21/11/29", "21/11/29", NA, "21/11/29"), pick3 = c("21/11/30",
NA, "21/11/30", "21/11/30")), class = "data.frame",
row.names = c(NA,
-4L))
CodePudding user response:
Yet another solution:
library(tidyverse)
df <- data.frame(
stringsAsFactors = FALSE,
ID = c(1L, 2L, 3L, 4L),
pick1 = c(NA, "21/11/28", "21/11/28", NA),
pick2 = c("21/11/29", "21/11/29", NA, "21/11/29"),
pick3 = c("21/11/30", NA, "21/11/30", "21/11/30")
)
df %>%
pivot_longer(cols = str_c("pick",1:3), values_drop_na = T) %>%
mutate(name = rep(c("date1","date2"), n()/2)) %>%
pivot_wider(ID)
#> # A tibble: 4 × 3
#> ID date1 date2
#> <int> <chr> <chr>
#> 1 1 21/11/29 21/11/30
#> 2 2 21/11/28 21/11/29
#> 3 3 21/11/28 21/11/30
#> 4 4 21/11/29 21/11/30