Home > Software engineering >  using apply on listcolumns in R seems inconsistent
using apply on listcolumns in R seems inconsistent

Time:08-17

..or at least inconsistent with my intuition.

I'm trying to extract data from inside a listcolumn using apply - in the example I've got a column of tibbles called eagles:

df1 <- tibble(
  location = c(1,2),
  eagles = list(
    tibble(
      talons = c(2,3,4),
      beaks = c("blue","red","red")),
    tibble(
      talons = c(2,3),
      beaks = c("red","red"))))

and extracting the beaks values as vectors using apply:

df1$beakz <- apply(df1, 1, \(x) x$eagles$beaks)

which works as expected:

> df1
# A tibble: 2 x 3
  location eagles           beakz    
     <dbl> <list>           <list>   
1        1 <tibble [3 x 2]> <chr [3]>
2        2 <tibble [2 x 2]> <chr [2]>

However if I add another row to one of the nested tibbles, the apply function won't play along anymore:

df2 <- tibble(
  location = c(1,2),
  eagles = list(
    tibble(
      talons = c(2,3,4),
      beaks = c("blue","red","red")),
    tibble(
      talons = c(2,3,2),
      beaks = c("red","red","yellow"))))

df2$beakz <- apply(df2, 1, \(x) x$eagles$beaks)
Error:
! Assigned data `apply(df2, 1, function(x) x$eagles$beaks)` must be compatible with existing data.
x Existing data has 2 rows.
x Assigned data has 3 rows.
i Only vectors of size 1 are recycled.

The expected output would be adding a listcolumn beakz with two vectors (of length 3) as elements.

Additionally, if both the nested tibbles have two rows only, the apply function does work, but instead of a single new listcolumn, I get two new columns:

df3 <- tibble(
   location = c(1,2),
   eagles = list(
     tibble(
       talons = c(2,3),
       beaks = c("blue","red")),
     tibble(
       talons = c(2,3),
       beaks = c("red","red"))))
df3$beakz <- apply(df3, 1, \(x) x$eagles$beaks)
df3
# A tibble: 2 x 3
  location eagles           beakz[,1] [,2] 
     <dbl> <list>           <chr>     <chr>
1        1 <tibble [2 x 2]> blue      red  
2        2 <tibble [2 x 2]> red       red  

This is a grossly simplified example, but basically, I would expect apply to function the same way in all three cases: I would like to extract a column as a vector and bring it up a level. Ideally using apply, although I'm sure there are purrr ways of doing this. But mainly I would just like to understand why this works this way, because debugging it has not been much fun :lolsob:

(also would appreciate it if someone with enough reputation could add listcolumn or list-column to the tags)

CodePudding user response:

This is happening because apply() does not return a list, it returns a 3x2 matrix, which has too many rows to be put into df2. To get it to do what you want you could e.g. coerce it to a data frame (to give the columns names) and then to a list. There's probably a more elegant way to do it. But basically apply() does not play well with the list-structure of your data, whereas the purrr functions do.

apply(df2, 1, \(x) x$eagles$beaks)
#>      [,1]   [,2]    
#> [1,] "blue" "red"   
#> [2,] "red"  "red"   
#> [3,] "red"  "yellow"

class(apply(df2, 1, \(x) x$eagles$beaks))
#> [1] "matrix" "array"

df2$beakz <- as.list(data.frame(apply(df2, 1, \(x) x$eagles$beaks)))

df2
#> # A tibble: 2 × 3
#>   location eagles           beakz       
#>      <dbl> <list>           <named list>
#> 1        1 <tibble [3 × 2]> <chr [3]>   
#> 2        2 <tibble [3 × 2]> <chr [3]>

df2$beakz
#> $X1
#> [1] "blue" "red"  "red" 
#> 
#> $X2
#> [1] "red"    "red"    "yellow"

CodePudding user response:

Purely for reference (not debugging OP), purrr works without issue:

library(purrr)

> mutate(df1, beaks=map(eagles, ~ .x$beaks))
# A tibble: 2 × 3
  location eagles           beaks    
     <dbl> <list>           <list>   
1        1 <tibble [3 × 2]> <chr [3]>
2        2 <tibble [2 × 2]> <chr [2]>

> mutate(df2, beaks=map(eagles, ~ .x$beaks))
# A tibble: 2 × 3
  location eagles           beaks    
     <dbl> <list>           <list>   
1        1 <tibble [3 × 2]> <chr [3]>
2        2 <tibble [3 × 2]> <chr [3]>

> mutate(df3, beaks=map(eagles, ~ .x$beaks))
# A tibble: 2 × 3
  location eagles           beaks    
     <dbl> <list>           <list>   
1        1 <tibble [2 × 2]> <chr [2]>
2        2 <tibble [2 × 2]> <chr [2]>
  • Related