Home > Software design >  Efficient way of subsetting nested list conditionally in R
Efficient way of subsetting nested list conditionally in R

Time:12-13

I have a large set of large named nested lists. Names of first level are variable, while the second levels are named according to some rules (examples provided below).

An example of the correct list is given below (x).

x <- list(`first-group` = list(val = c(534L, 582L, 298L, 645L, 314L, 
237L, 418L, 348L, 363L, 133L, 493L, 721L, 722L, 210L, 467L, 474L, 
145L, 638L, 545L, 330L, 709L, 712L, 674L, 492L, 262L, 663L, 609L, 
142L, 428L, 254L), co = c(0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 
1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 
1L, 1L, 1L, 1L, 0L)), `second-group` = list(val = c(505L, 647L, 
88L, 208L, 801L, 258L, 423L, 83L, 565L, 62L, 118L, 804L, 458L, 
357L, 327L, 138L, 586L, 340L, 473L, 335L, 720L, 170L, 159L, 207L, 
113L, 532L, 526L, 529L, 760L, 116L, 712L, 134L, 214L, 697L, 100L, 
123L, 227L, 411L, 285L, 659L, 379L, 775L, 176L), co = c(0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), 
    `third-group` = list(val = c(713L, 721L, 683L, 526L, 699L, 
    555L, 563L, 672L, 619L, 603L, 588L, 533L, 622L, 724L, 616L, 
    644L, 730L, 716L, 660L, 663L, 611L, 669L, 644L, 664L, 679L, 
    514L, 579L, 525L, 533L, 541L, 530L, 564L, 584L, 673L, 592L, 
    726L, 548L, 563L, 727L, 646L, 708L, 557L, 586L, 592L, 693L, 
    620L, 548L, 705L, 510L, 677L, 539L, 603L, 726L, 525L, 597L, 
    563L, 712L), co = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0)), `fourth-group` = list(val = c(142L, 317L, 
    286L, 174L, 656L, 299L, 676L, 206L, 645L, 755L, 514L, 424L, 
    719L, 741L, 711L, 552L, 550L, 372L, 551L, 520L, 650L, 503L, 
    667L, 162L, 644L, 595L, 322L, 247L), co = c(0L, 0L, 0L, 0L, 
    1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
    1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L)))

Bespoke lists are produced from datasets which may contain some errors. Since the lists are large, it is hard to spot the errors. The structure of such an erroneous lists is preserved, although some variables are of wrong type (e.g. character or NA instead of numeric).

An example of wrong list is also given below (wrong_x).

wrong_x <- list(`first-group` = list(val = "this/is/character/variable", 
    co = c(0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 
    0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 
    1L, 0L)), `second-group` = list(val = c(505L, 647L, 88L, 
208L, 801L, 258L, 423L, 83L, 565L, 62L, 118L, 804L, 458L, 357L, 
327L, 138L, 586L, 340L, 473L, 335L, 720L, 170L, 159L, 207L, 113L, 
532L, 526L, 529L, 760L, 116L, 712L, 134L, 214L, 697L, 100L, 123L, 
227L, 411L, 285L, 659L, 379L, 775L, 176L), co = c(0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), `third-group` = list(
    val = c(713L, 721L, 683L, 526L, 699L, 555L, 563L, 672L, 619L, 
    603L, 588L, 533L, 622L, 724L, 616L, 644L, 730L, 716L, 660L, 
    663L, 611L, 669L, 644L, 664L, 679L, 514L, 579L, 525L, 533L, 
    541L, 530L, 564L, 584L, 673L, 592L, 726L, 548L, 563L, 727L, 
    646L, 708L, 557L, 586L, 592L, 693L, 620L, 548L, 705L, 510L, 
    677L, 539L, 603L, 726L, 525L, 597L, 563L, 712L), co = c(0, 
    0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), `fourth-group` = list(
    val = NA, co = c(0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 
    1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 
    0L, 1L, 1L)))

ALso it might happen that the entire list has wrong variable types in sublists of interest - as in below example:

wrong2_x <- list(`first-group` = list(val = "this/is/character/variable", 
    co = c(0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 
    0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 
    1L, 0L)), `second-group` = list(val = "this/is/character/variable/too", co = c(0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), `third-group` = list(
    val = "and/this", co = c(0, 
    0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), `fourth-group` = list(
    val = NA, co = c(0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 
    1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 
    0L, 1L, 1L)))

I wrote a simple function which mimicks my workflow. It contains filtering based on "$val" sublists (whether they contain numeric or not). If the resulting prefiltered list would be empty, the workflow should instantly stop and throw an error. The code is provided below:

my_function <- function(input_list){
  # data prefiltering
  input_list <- Filter(function(x) is.numeric(x$val), input_list)
  
  # condition
  if (length(input_list) == 0){
    stop("Better Call Saul.", call. =FALSE)
  } else {
    # there shall be other data wrangling functions below is just a dummy assignment
    output_list <- input_list
  }
  return(output_list)
}

Is there a more elegant (code-efficient) way to achieve the same result?

CodePudding user response:

package purrr helps with list manipulation, example:

library(purrr)

is_faulty_list <- function(the_list){
    the_list |> 
        map('val') |> ## pluck list members named 'val'
        discard(~ is.numeric(.x)) |> ## keep only not numeric items
        length() ## should be zero (if only numeric items)
}

if(is_faulty_list(x)) print('calling Saul')



#> if(is_faulty_list(wrong_x)) print('calling Saul')
#[1] "calling Saul"
  • Related