Certain "heterogenous" pivots can be achieved by joining multiple simpler pivots. Do pivot specs allow these to happen in one go? For example, say you have this data:
library(dplyr)
library(tidyr)
set.seed(123)
Z <- tibble(id = rep(1:3, 2:4),
x = runif(length(id)),
month = sample(month.abb, length(id)))
Suppose for each id
you want to sum its x
s and then report in which months those x
s happened via monthly true/false columns. You could do it as follows, using two pivots:
Z |>
mutate(name = 'x') ->
Z1
Z |>
mutate(value = TRUE) ->
Z2
inner_join(
Z1 |> pivot_wider(id_cols = id, values_from = x, values_fn = sum), # alternatively group_by & summarize
Z2 |> pivot_wider(id_cols = id, names_from = month, values_fill = FALSE),
by = 'id') ->
desired_result
A variation of the above using pivot specs (and still using two pivots) is:
Z1 |>
build_wider_spec(values_from = x) ->
spec1
Z2 |>
build_wider_spec(names_from = month) ->
spec2
inner_join(
Z1 |> pivot_wider_spec(spec1, id_cols = id, values_fn = sum),
Z2 |> pivot_wider_spec(spec2, id_cols = id, values_fill = FALSE),
by = 'id') ->
desired_result2
stopifnot(identical(desired_result, desired_result2))
My question is whether using pivot specs this can be done via a single pivot. What I have in mind is something along the lines of:
# Does not work
Z |>
mutate(name = 'x', value = TRUE) |>
pivot_wider_spec(bind_rows(spec1, spec2), id_cols = id, values_fn = list(x = sum))
Although the above does not work.
(FWIW, I suspect that the above does not work because bind_rows(spec1, spec2)
contains NA
s; more generally I suspect pivot_wider
is not designed to cover this case; however I thought it would be worth asking the experts.)
CodePudding user response:
Perhaps not 100% what you had in mind but using one mutate
step and the id_cols
and unused_fn
arguments of pivot_wider
you could do:
library(dplyr, warn=FALSE)
library(tidyr)
Z |>
mutate(value = TRUE) |>
pivot_wider(
id_cols = id, names_from = month, values_from = value,
unused_fn = sum, values_fill = FALSE
)
#> # A tibble: 3 × 11
#> id May Apr Jun Sep Feb Mar Dec Jul Oct x
#> <int> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <dbl>
#> 1 1 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 1.08
#> 2 2 FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE 2.23
#> 3 3 FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE 2.02
CodePudding user response:
Here, we don't need two pivot_wider
. Instead, it can be done by creating a new column of sum and then do pivot_wider
once by making use of values_fn
and values_fill
library(dplyr)
library(tidyr)
Z %>%
group_by(id) %>%
mutate(x1 = x, x = sum(x)) %>%
ungroup %>%
pivot_wider(names_from = month, values_from = x1,
values_fn = ~ TRUE, values_fill = FALSE)
-output
# A tibble: 3 × 11
id x May Apr Jun Sep Feb Mar Dec Jul Oct
<int> <dbl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
1 1 1.08 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
2 2 2.23 FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
3 3 2.02 FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
-OP's output
> desired_result
# A tibble: 3 × 11
id x May Apr Jun Sep Feb Mar Dec Jul Oct
<int> <dbl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
1 1 1.08 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
2 2 2.23 FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
3 3 2.02 FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
EDIT: Based on @banbh comments