Home > Enterprise >  tidyr separate_rows with user defined function? (r / tidyverse)
tidyr separate_rows with user defined function? (r / tidyverse)

Time:02-20

separate_rows separate based on column values into multiple rows, repeating value of other columns.

> t <- tibble(x = c("a,b", "c,d"), v = c(1,2))
> t %>% separate_rows(x, sep = ",")
# A tibble: 4 × 2
  x         v
  <chr> <dbl>
1 a         1
2 b         1
3 c         2
4 d         2

However, what if I want to apply a function over it? after the separate for example change the value of x to true if in ("a", "b") and false otherwise.

I understand all I need to do is a mutate follow separate_rows. My question is if there is already a function that does separate and process a comma delimited value. How do I use the function in a similar way as separate_rows? (the reason is I want to separate complex split logic into a function rather than in mutate)

For example below does the logic above and return a vector of values. Is it possible perform similar operation as separate rows? (ie. split on the column and repeating row values)

proc <- function(text){
  text %>% 
    str_split(pattern = ",") %>%
    unlist() %>%
    sapply(function(x){
            if(x %in% c("a", "b")) 
              return(T) 
            else 
              return(F)
          })
}

CodePudding user response:

Kind of

If you keep the output of your function (here proc) in list form instead of unlisting, you can apply that function to x with mutate and then unnest x. Keeping it in list form preserves the info about which element of proc(t$x) corresponds to which row of t, and that info is lost when you unlist.

library(tidyr)
library(stringr)
library(dplyr, warn.conflicts = FALSE)

proc <- function(text) {
  text %>%
    str_split(pattern = ",") %>%
    lapply(function(x) {
      x %in% c("a", "b")
    })
}

t <- tibble(x = c("a,b", "c,d"), v = c(1,2))

t %>% 
  mutate(x = proc(x)) %>% 
  unnest(x)
#> # A tibble: 4 × 2
#>   x         v
#>   <lgl> <dbl>
#> 1 TRUE      1
#> 2 TRUE      1
#> 3 FALSE     2
#> 4 FALSE     2

Created on 2022-02-20 by the reprex package (v2.0.1)

But, if you're going to use two functions anyway (mutate and unnest), you may as well just use separate_rows and then mutate.

Or, you could pack everything into the proc function.

library(tidyr)
library(stringr)
library(dplyr, warn.conflicts = FALSE)

proc <- function(df, col) {
  fun <- function(text) {
    text %>%
      str_split(pattern = ",") %>%
      lapply(function(x) {
        x %in% c("a", "b")
      })
  }
  df %>% 
    mutate(across({{ col }}, fun)) %>% 
    unnest({{ col }})
}

t <- tibble(x = c("a,b", "c,d"), v = c(1,2))

t %>% 
  proc(x)
#> # A tibble: 4 × 2
#>   x         v
#>   <lgl> <dbl>
#> 1 TRUE      1
#> 2 TRUE      1
#> 3 FALSE     2
#> 4 FALSE     2

Created on 2022-02-20 by the reprex package (v2.0.1)

  • Related