Home > Mobile >  How to map a function on a nested Tibble in R when there are NA in nested column?
How to map a function on a nested Tibble in R when there are NA in nested column?

Time:01-17

Consider a Tibble like this:

df <- tibble(station = c("station1","station2"), data = c(list(NA), list(tibble(timestamp = c("2001-01-01","2002-01-02", "2002-01-03"), value=c(1,2,3)))))

I now want to map a function, like lubridate::ymd() on the timestamp column in the nested tibble. Problem is that i have NA values in the parent data column of the df-tibble, that need to stay there.

Is there a working solution?

Thank you very much!

I tried several things with mutate, mutate_if, map, map_if, else_if but nothing worked for me.

CodePudding user response:

You can use conditionals within the mapping function:

df2 <- df %>%
  mutate(data = map(data, function(x) {
    if(is.data.frame(x)) mutate(x, timestamp = lubridate::ymd(timestamp))
    else x
    }))

df2
#> # A tibble: 2 x 2
#>    station  data            
#>    <chr>    <list>          
#>  1 station1 <lgl [1]>       
#>  2 station2 <tibble [3 x 2]>

df2$data
#> [[1]]
#> [1] NA
#> 
#> [[2]]
#> # A tibble: 3 x 2
#>   timestamp  value
#>   <date>     <dbl>
#> 1 2001-01-01     1
#> 2 2002-01-02     2
#> 3 2002-01-03     3

CodePudding user response:

Another option would be to use map_if with is.data.frame:

library(tidyverse)

df <- tibble(
  station = c("station1", "station2"),
  data = c(list(NA), list(tibble(timestamp = c("2001-01-01", "2002-01-02", "2002-01-03"), value = c(1, 2, 3))))
)

df <- df |> 
  mutate(data = map_if(data, is.data.frame, ~ mutate(.x, timestamp = lubridate::ymd(timestamp)))) 

df$data
#> [[1]]
#> [1] NA
#> 
#> [[2]]
#> # A tibble: 3 × 2
#>   timestamp  value
#>   <date>     <dbl>
#> 1 2001-01-01     1
#> 2 2002-01-02     2
#> 3 2002-01-03     3

CodePudding user response:

Possible solution with nest/unnest:

> df2 <- df %>% unnest(data) %>% mutate(timestamp = ymd(timestamp)) %>% nest(data = c(timestamp, value))
> df2
# A tibble: 2 × 2
  station  data            
  <chr>    <list>          
1 station1 <tibble [1 × 2]>
2 station2 <tibble [3 × 2]>
> df2$data
[[1]]
# A tibble: 1 × 2
  timestamp value
  <date>    <dbl>
1 NA           NA

[[2]]
# A tibble: 3 × 2
  timestamp  value
  <date>     <dbl>
1 2001-01-01     1
2 2002-01-02     2
3 2002-01-03     3
  • Related