Assuming I have a data frame df
> dput(df)
structure(list(x = c("X", "X", "X", "Y", "Y", "Z", "Z", "Z"),
y = c("A", "B", "C", "B", "C", "A", "C", "D")), class = "data.frame", row.names = c(NA,
-8L))
> df
x y
1 X A
2 X B
3 X C
4 Y B
5 Y C
6 Z A
7 Z C
8 Z D
and generate a list u1
like below
u1 <- with(
df,
tapply(y, x, combn, 2, toString)
)
where
> u1
$X
[1] "A, B" "A, C" "B, C"
$Y
[1] "B, C"
$Z
[1] "A, C" "A, D" "C, D"
> str(u1)
List of 3
$ X: chr [1:3(1d)] "A, B" "A, C" "B, C"
$ Y: chr [1(1d)] "B, C"
$ Z: chr [1:3(1d)] "A, C" "A, D" "C, D"
- attr(*, "dim")= int 3
- attr(*, "dimnames")=List of 1
..$ : chr [1:3] "X" "Y" "Z"
When I ran stack(u1)
, I will have the following error
> stack(u1)
Error in stack.default(u1) : at least one vector element is required
It seems that I cannot use stack
over the output of tapply
directly even if it is a named list.
However, when I use u2 <- Map(c,u1)
for postprocessing, then things get working again
> u2 <- Map(c, u1)
> u2
$X
[1] "A, B" "A, C" "B, C"
$Y
[1] "B, C"
$Z
[1] "A, C" "A, D" "C, D"
> str(u2)
List of 3
$ X: chr [1:3] "A, B" "A, C" "B, C"
$ Y: chr "B, C"
$ Z: chr [1:3] "A, C" "A, D" "C, D"
> stack(u2)
values ind
1 A, B X
2 A, C X
3 B, C X
4 B, C Y
5 A, C Z
6 A, D Z
7 C, D Z
As we can see, in str(u2)
, the attributes are filtered out, which seems solving the issue.
My question is:
Why u1
failed but u2
succeeded? Is there any other way I can use tapply
over u1
without any postprocessing (like Map(c, u1)
)?
CodePudding user response:
tapply
returns an array
(or a list
if you set simplify = FALSE
), and stack
doesn't like an array input. The tapply
documentation doesn't sound like there are other output options. From ?tapply
(emphasis mine):
simplify
:
logical
; ifFALSE
,tapply
always returns an array of mode "list"; in other words, alist
with adim
attribute. IfTRUE
(the default), then ifFUN
always returns a scalar,tapply
returns an array with the mode of the scalar.
So I'd recommend casting to character:
stack(lapply(u1, as.character))
# values ind
# 1 A, B X
# 2 A, C X
# 3 B, C X
# 4 B, C Y
# 5 A, C Z
# 6 A, D Z
# 7 C, D Z
If you're concerned about speed, you could run benchmarks to see, removing the dim
attribute might be faster than as.character()
,
stack(lapply(u1, "dim<-", NULL))
# same result
CodePudding user response:
Or may also use as.vector/c
to remove the attributes and convert the 1d
vector to a vector with no dim attributes
stack(lapply(u1, c))
values ind
1 A, B X
2 A, C X
3 B, C X
4 B, C Y
5 A, C Z
6 A, D Z
7 C, D Z
According to ?stack
Note that stack applies to vectors (as determined by is.vector): non-vector columns (e.g., factors) will be ignored with a warning.
The is.vector
returns FALSE
for all the member elements of 'u1'
> sapply(u1, is.vector)
X Y Z
FALSE FALSE FALSE
However, this works with enframe
though
library(tibble)
library(tidyr)
enframe(u1) %>%
unnest(value)
# A tibble: 7 × 2
name value
<chr> <chr>
1 X A, B
2 X A, C
3 X B, C
4 Y B, C
5 Z A, C
6 Z A, D
7 Z C, D