..or at least inconsistent with my intuition.
I'm trying to extract data from inside a listcolumn using apply
- in the example I've got a column of tibbles called eagles
:
df1 <- tibble(
location = c(1,2),
eagles = list(
tibble(
talons = c(2,3,4),
beaks = c("blue","red","red")),
tibble(
talons = c(2,3),
beaks = c("red","red"))))
and extracting the beaks
values as vectors using apply
:
df1$beakz <- apply(df1, 1, \(x) x$eagles$beaks)
which works as expected:
> df1
# A tibble: 2 x 3
location eagles beakz
<dbl> <list> <list>
1 1 <tibble [3 x 2]> <chr [3]>
2 2 <tibble [2 x 2]> <chr [2]>
However if I add another row to one of the nested tibbles, the apply
function won't play along anymore:
df2 <- tibble(
location = c(1,2),
eagles = list(
tibble(
talons = c(2,3,4),
beaks = c("blue","red","red")),
tibble(
talons = c(2,3,2),
beaks = c("red","red","yellow"))))
df2$beakz <- apply(df2, 1, \(x) x$eagles$beaks)
Error:
! Assigned data `apply(df2, 1, function(x) x$eagles$beaks)` must be compatible with existing data.
x Existing data has 2 rows.
x Assigned data has 3 rows.
i Only vectors of size 1 are recycled.
The expected output would be adding a listcolumn beakz
with two vectors (of length 3) as elements.
Additionally, if both the nested tibbles have two rows only, the apply function does work, but instead of a single new listcolumn, I get two new columns:
df3 <- tibble(
location = c(1,2),
eagles = list(
tibble(
talons = c(2,3),
beaks = c("blue","red")),
tibble(
talons = c(2,3),
beaks = c("red","red"))))
df3$beakz <- apply(df3, 1, \(x) x$eagles$beaks)
df3
# A tibble: 2 x 3
location eagles beakz[,1] [,2]
<dbl> <list> <chr> <chr>
1 1 <tibble [2 x 2]> blue red
2 2 <tibble [2 x 2]> red red
This is a grossly simplified example, but basically, I would expect apply
to function the same way in all three cases: I would like to extract a column as a vector and bring it up a level. Ideally using apply
, although I'm sure there are purrr
ways of doing this. But mainly I would just like to understand why this works this way, because debugging it has not been much fun :lolsob:
(also would appreciate it if someone with enough reputation could add listcolumn
or list-column
to the tags)
CodePudding user response:
This is happening because apply()
does not return a list, it returns a 3x2 matrix, which has too many rows to be put into df2
. To get it to do what you want you could e.g. coerce it to a data frame (to give the columns names) and then to a list. There's probably a more elegant way to do it. But basically apply()
does not play well with the list-structure of your data, whereas the purrr
functions do.
apply(df2, 1, \(x) x$eagles$beaks)
#> [,1] [,2]
#> [1,] "blue" "red"
#> [2,] "red" "red"
#> [3,] "red" "yellow"
class(apply(df2, 1, \(x) x$eagles$beaks))
#> [1] "matrix" "array"
df2$beakz <- as.list(data.frame(apply(df2, 1, \(x) x$eagles$beaks)))
df2
#> # A tibble: 2 × 3
#> location eagles beakz
#> <dbl> <list> <named list>
#> 1 1 <tibble [3 × 2]> <chr [3]>
#> 2 2 <tibble [3 × 2]> <chr [3]>
df2$beakz
#> $X1
#> [1] "blue" "red" "red"
#>
#> $X2
#> [1] "red" "red" "yellow"
CodePudding user response:
Purely for reference (not debugging OP), purrr
works without issue:
library(purrr)
> mutate(df1, beaks=map(eagles, ~ .x$beaks))
# A tibble: 2 × 3
location eagles beaks
<dbl> <list> <list>
1 1 <tibble [3 × 2]> <chr [3]>
2 2 <tibble [2 × 2]> <chr [2]>
> mutate(df2, beaks=map(eagles, ~ .x$beaks))
# A tibble: 2 × 3
location eagles beaks
<dbl> <list> <list>
1 1 <tibble [3 × 2]> <chr [3]>
2 2 <tibble [3 × 2]> <chr [3]>
> mutate(df3, beaks=map(eagles, ~ .x$beaks))
# A tibble: 2 × 3
location eagles beaks
<dbl> <list> <list>
1 1 <tibble [2 × 2]> <chr [2]>
2 2 <tibble [2 × 2]> <chr [2]>