Home > front end >  Can a heterogenous pivot_wider be done using pivot specs?
Can a heterogenous pivot_wider be done using pivot specs?

Time:12-12

Certain "heterogenous" pivots can be achieved by joining multiple simpler pivots. Do pivot specs allow these to happen in one go? For example, say you have this data:

library(dplyr)
library(tidyr)

set.seed(123)

Z <- tibble(id = rep(1:3, 2:4),
            x = runif(length(id)),
            month = sample(month.abb, length(id)))

Suppose for each id you want to sum its xs and then report in which months those xs happened via monthly true/false columns. You could do it as follows, using two pivots:

Z |> 
  mutate(name = 'x') ->
  Z1

Z |> 
  mutate(value = TRUE) ->
  Z2

inner_join(
  Z1 |> pivot_wider(id_cols = id, values_from = x, values_fn = sum), # alternatively group_by & summarize
  Z2 |> pivot_wider(id_cols = id, names_from = month, values_fill = FALSE),
  by = 'id') ->
  desired_result

A variation of the above using pivot specs (and still using two pivots) is:

Z1 |> 
  build_wider_spec(values_from = x) ->
  spec1

Z2 |> 
  build_wider_spec(names_from = month) ->
  spec2

inner_join(
  Z1 |> pivot_wider_spec(spec1, id_cols = id, values_fn = sum),
  Z2 |> pivot_wider_spec(spec2, id_cols = id, values_fill = FALSE),
  by = 'id') ->
  desired_result2

stopifnot(identical(desired_result, desired_result2))

My question is whether using pivot specs this can be done via a single pivot. What I have in mind is something along the lines of:

# Does not work
Z |> 
  mutate(name = 'x', value = TRUE) |> 
  pivot_wider_spec(bind_rows(spec1, spec2), id_cols = id, values_fn = list(x = sum))

Although the above does not work.

(FWIW, I suspect that the above does not work because bind_rows(spec1, spec2) contains NAs; more generally I suspect pivot_wider is not designed to cover this case; however I thought it would be worth asking the experts.)

CodePudding user response:

Perhaps not 100% what you had in mind but using one mutate step and the id_cols and unused_fn arguments of pivot_wider you could do:

library(dplyr, warn=FALSE)
library(tidyr)

Z |>
  mutate(value = TRUE) |>
  pivot_wider(
    id_cols = id, names_from = month, values_from = value,
    unused_fn = sum, values_fill = FALSE
  )
#> # A tibble: 3 × 11
#>      id May   Apr   Jun   Sep   Feb   Mar   Dec   Jul   Oct       x
#>   <int> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <dbl>
#> 1     1 TRUE  TRUE  FALSE FALSE FALSE FALSE FALSE FALSE FALSE  1.08
#> 2     2 FALSE FALSE TRUE  TRUE  TRUE  FALSE FALSE FALSE FALSE  2.23
#> 3     3 FALSE FALSE FALSE FALSE FALSE TRUE  TRUE  TRUE  TRUE   2.02

CodePudding user response:

Here, we don't need two pivot_wider. Instead, it can be done by creating a new column of sum and then do pivot_wider once by making use of values_fn and values_fill

library(dplyr)
library(tidyr)
Z %>%
   group_by(id) %>% 
   mutate(x1 = x, x = sum(x)) %>%
   ungroup %>% 
   pivot_wider(names_from = month, values_from = x1, 
         values_fn = ~ TRUE, values_fill = FALSE)

-output

# A tibble: 3 × 11
     id     x May   Apr   Jun   Sep   Feb   Mar   Dec   Jul   Oct  
  <int> <dbl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
1     1  1.08 TRUE  TRUE  FALSE FALSE FALSE FALSE FALSE FALSE FALSE
2     2  2.23 FALSE FALSE TRUE  TRUE  TRUE  FALSE FALSE FALSE FALSE
3     3  2.02 FALSE FALSE FALSE FALSE FALSE TRUE  TRUE  TRUE  TRUE 

-OP's output

> desired_result
# A tibble: 3 × 11
     id     x May   Apr   Jun   Sep   Feb   Mar   Dec   Jul   Oct  
  <int> <dbl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
1     1  1.08 TRUE  TRUE  FALSE FALSE FALSE FALSE FALSE FALSE FALSE
2     2  2.23 FALSE FALSE TRUE  TRUE  TRUE  FALSE FALSE FALSE FALSE
3     3  2.02 FALSE FALSE FALSE FALSE FALSE TRUE  TRUE  TRUE  TRUE 

EDIT: Based on @banbh comments

  • Related