Home > front end >  why `stack` cannot work on the result of `tapply`?
why `stack` cannot work on the result of `tapply`?

Time:12-14

Assuming I have a data frame df

> dput(df)
structure(list(x = c("X", "X", "X", "Y", "Y", "Z", "Z", "Z"),
    y = c("A", "B", "C", "B", "C", "A", "C", "D")), class = "data.frame", row.names = c(NA,
-8L))

> df
  x y
1 X A
2 X B
3 X C
4 Y B
5 Y C
6 Z A
7 Z C
8 Z D

and generate a list u1 like below

u1 <- with(
  df,
  tapply(y, x, combn, 2, toString)
)

where

> u1
$X
[1] "A, B" "A, C" "B, C"

$Y
[1] "B, C"

$Z
[1] "A, C" "A, D" "C, D"

> str(u1)
List of 3
 $ X: chr [1:3(1d)] "A, B" "A, C" "B, C"
 $ Y: chr [1(1d)] "B, C"
 $ Z: chr [1:3(1d)] "A, C" "A, D" "C, D"
 - attr(*, "dim")= int 3
 - attr(*, "dimnames")=List of 1
  ..$ : chr [1:3] "X" "Y" "Z"

When I ran stack(u1), I will have the following error

> stack(u1)
Error in stack.default(u1) : at least one vector element is required

It seems that I cannot use stack over the output of tapply directly even if it is a named list.

However, when I use u2 <- Map(c,u1) for postprocessing, then things get working again

> u2 <- Map(c, u1)

> u2
$X
[1] "A, B" "A, C" "B, C"

$Y
[1] "B, C"

$Z
[1] "A, C" "A, D" "C, D"


> str(u2)
List of 3
 $ X: chr [1:3] "A, B" "A, C" "B, C"
 $ Y: chr "B, C"
 $ Z: chr [1:3] "A, C" "A, D" "C, D"

> stack(u2)
  values ind
1   A, B   X
2   A, C   X
3   B, C   X
4   B, C   Y
5   A, C   Z
6   A, D   Z
7   C, D   Z

As we can see, in str(u2), the attributes are filtered out, which seems solving the issue.


My question is:

Why u1 failed but u2 succeeded? Is there any other way I can use tapply over u1 without any postprocessing (like Map(c, u1))?

CodePudding user response:

tapply returns an array (or a list if you set simplify = FALSE), and stack doesn't like an array input. The tapply documentation doesn't sound like there are other output options. From ?tapply (emphasis mine):

simplify:

logical; if FALSE, tapply always returns an array of mode "list"; in other words, a list with a dim attribute. If TRUE (the default), then if FUN always returns a scalar, tapply returns an array with the mode of the scalar.

So I'd recommend casting to character:

stack(lapply(u1, as.character))
#   values ind
# 1   A, B   X
# 2   A, C   X
# 3   B, C   X
# 4   B, C   Y
# 5   A, C   Z
# 6   A, D   Z
# 7   C, D   Z

If you're concerned about speed, you could run benchmarks to see, removing the dim attribute might be faster than as.character(),

stack(lapply(u1, "dim<-", NULL))
# same result

CodePudding user response:

Or may also use as.vector/c to remove the attributes and convert the 1d vector to a vector with no dim attributes

stack(lapply(u1, c))
  values ind
1   A, B   X
2   A, C   X
3   B, C   X
4   B, C   Y
5   A, C   Z
6   A, D   Z
7   C, D   Z

According to ?stack

Note that stack applies to vectors (as determined by is.vector): non-vector columns (e.g., factors) will be ignored with a warning.

The is.vector returns FALSE for all the member elements of 'u1'

> sapply(u1, is.vector)
    X     Y     Z 
FALSE FALSE FALSE 

However, this works with enframe though

library(tibble)
library(tidyr)
enframe(u1) %>%
   unnest(value)
# A tibble: 7 × 2
  name  value
  <chr> <chr>
1 X     A, B 
2 X     A, C 
3 X     B, C 
4 Y     B, C 
5 Z     A, C 
6 Z     A, D 
7 Z     C, D 
  • Related