Home > Enterprise >  Applying Between function to a list column
Applying Between function to a list column

Time:09-17

I'm working with a data frame where each row contains 3 columns: a left bound, right bound and a list of values "pool".

    test <- structure(list(left = c(645, 1292, 220, 450), right = c(669, 
1309, 230, 600), pool = list(structure(c(1242L, 1469L), match.length = c(6L, 
6L), index.type = "chars", useBytes = TRUE), structure(c(223L, 
833L, 987L, 1513L, 1759L, 1805L, 2244L), match.length = c(6L, 
6L, 6L, 6L, 6L, 6L, 6L), index.type = "chars", useBytes = TRUE), 
    structure(223L, match.length = 6L, index.type = "chars", useBytes = TRUE), 
    structure(c(248L, 491L, 568L, 811L, 1151L, 1200L), match.length = c(6L, 
    6L, 6L, 6L, 6L, 6L), index.type = "chars", useBytes = TRUE))), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))

enter image description here

What I'm trying to accomplish is

  1. for each row, are any of the values in the pool between the left and right bounds? This should return a vector of FALSE FALSE TRUE TRUE
  2. which element(s) of the list are between the bounds? Should return NA NA 223 (491,568)

I'm getting close using lapply and specifying the index of each row individually.

> lapply(test$pool[1], between, left = test$left[1], right = test$right[1])
[[1]]
[1] FALSE FALSE

> lapply(test$pool[2], between, left = test$left[2], right = test$right[2])
[[1]]
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE

> lapply(test$pool[3], between, left = test$left[3], right = test$right[3])
[[1]]
[1] TRUE

> lapply(test$pool[4], between, left = test$left[4], right = test$right[4])
[[1]]
[1] FALSE  TRUE  TRUE FALSE FALSE FALSE

But I'm not good enough with lists yet to get my head around how to apply this over all the rows of the dataframe for part 1, or to extract out the list entries for part 2.

Thanks for your expertise!

CodePudding user response:

You can use create a small function that checks if there are values between left and right, and returns those values (if any), and apply that function rowwise:

f <- function(l,r,p) {
  bvals = between(p,l,r)
  list(hasvals=any(bvals),vals=p[bvals])
}

test %>% 
  rowwise %>% 
  mutate(r = list(f(left,right, pool))) %>% 
  unnest_wider(r)

Output:

   left right pool      hasvals vals     
  <dbl> <dbl> <list>    <lgl>   <list>   
1   645   669 <int [2]> FALSE   <NULL>   
2  1292  1309 <int [7]> FALSE   <NULL>   
3   220   230 <int [1]> TRUE    <int [1]>
4   450   600 <int [6]> TRUE    <int [2]>

CodePudding user response:

We can use rowwise, create a temporary column with a logical index for every element of pool values, use this index to create the logical between_any and integer values columns, then finally remove the index column.

library(dplyr)

test %>% rowwise() %>% 
    mutate(between = list(between(pool, left, right)),
           between_any = any(between),
           values = list(pool[between])) %>% 
    select(-between) %>% 
    ungroup()

# A tibble: 4 × 5
   left right pool      between_any values   
  <dbl> <dbl> <list>    <lgl>       <list>   
1   645   669 <int [2]> FALSE       <int [0]>
2  1292  1309 <int [7]> FALSE       <int [0]>
3   220   230 <int [1]> TRUE        <int [1]>
4   450   600 <int [6]> TRUE        <int [2]>
  • Related