Home > database >  How to extract value from nested tibble in mutate?
How to extract value from nested tibble in mutate?

Time:06-03

I have a tibble:

library(dplyr)
my_tib <- tibble(names = c("blah", "blah2"), data = list(1, c(2, 3)))

which looks like this:

  names data     
  <chr> <list>   
1 blah  <dbl [1]>
2 blah2 <dbl [2]>

Now I'd like to extract the first element of the data column from within a mutate clause conditional on whether the first element of the data entry is <10 or >10:

my_tib %>%
  rowwise() %>%
  mutate(data_min = case_when(lengths(data) == 2 ~ data[[1]],
                              lengths(data) == 1 & data[[1]] > 10 ~ NA_integer_,
                              lengths(data) == 1 & data[[1]] < 10 ~ data[[1]]))

When I run this, I get the following error:

Error in `mutate()`:
! Problem while computing `data_min = case_when(...)`.
✖ `data_min` must be size 1, not 2.
ℹ Did you mean: `data_min = list(case_when(...))` ?
ℹ The error occurred in row 2.
Run `rlang::last_error()` to see where the error occurred.

What I'd like to get is

  names data       data_min
  <chr> <list>     <int>
1 blah  <dbl [1]>  1
2 blah2 <dbl [2]>  2

I had a look here but that didn't help me with my particular situation.

CodePudding user response:

This may be done in a slightly more compact option i.e. use map to loop over the list column and get the first observation, then with case_when change the values in 'data_min' that are greater than 10 and when lengths are not 1 to NA

library(purrr)
library(dplyr)
my_tib %>% 
  mutate(data_min = map_dbl(data, first), 
         data_min = case_when(lengths(data) == 1|data_min <10 ~ data_min))

In the OP's code, the lengths(data) return length of 1 and 1 as data is a list. Instead, have to extract the element

my_tib %>%
  rowwise() %>%
  mutate(data_min = case_when(lengths(data[[1]]) == 2 
    ~ data[[1]][1],
           lengths(data[[1]]) == 1 & data[[1]][1] > 10 ~ NA_real_,
             lengths(data[[1]]) == 1 & data[[1]][1] < 10 ~ data[[1]][1]))

-output

# A tibble: 2 × 3
# Rowwise: 
  names data      data_min
  <chr> <list>       <dbl>
1 blah  <dbl [1]>        1
2 blah2 <dbl [2]>        2
  • Related