Subsetting nested lists based on condition (values) in R-CodePudding

I have a large nested list (list of named lists) - the example of such a list is given below. I would like to create a new list, in which only sub-lists with "co" vectors containing both 0 and 1 values would be preserved, while 0-only sublists would be discarded (eg. the output should contain only first-, third- and fourth- subgroups. I played with lapply and filter according to this thread:

Subset elements in a list based on a logical condition

However, it throwed errors. I would appreciate tips how to handle lists within the lists.

# reprex
set.seed(123)

## empty lists
first_group <- list()
second_group <- list()
third_group <- list()
fourth_group <- list()

# dummy_vecs
values1 <- c(sample(120:730, 30, replace=TRUE))
coeff1 <- c(sample(0:1, 30, replace=TRUE))

values2 <- c(sample(50:810, 43, replace=TRUE))
coeff2 <- c(rep(0, 43))

values3 <- c(sample(510:730, 57, replace=TRUE))
coeff3 <- c(rep(0, 8), rep(1, 4), rep(0, 45))

values4 <- c(sample(123:770, 28, replace=TRUE))
coeff4 <- c(sample(0:1, 28, replace=TRUE))

## fill lists with values:
first_group[["val"]] <- values1
first_group[["co"]] <- coeff1

second_group[["val"]] <- values2
second_group[["co"]] <- coeff2

third_group[["val"]] <- values3
third_group[["co"]] <- coeff3

fourth_group[["val"]] <- values4
fourth_group[["co"]] <- coeff4

#concatenate lists:
dummy_list <- list()

dummy_list[["first-group"]] <- first_group
dummy_list[["second-group"]] <- second_group
dummy_list[["third-group"]] <- third_group
dummy_list[["fourth-group"]] <- fourth_group

rm(values1, values2, values3, values4, coeff1, coeff2, coeff3, coeff4, first_group, second_group, third_group, fourth_group)
gc()

#show list
print(dummy_list)

CodePudding user response：

# create boolean for where condition is TRUE
cond <- sapply(dummy_list, function(x) any(0 %in% x$co) & any(1 %in% x$co))

# subset
dummy_list[cond]

CodePudding user response：

You could use Filter from base R:

Filter(function(x) sum(x$co) !=0, dummy_list)

Or you can use purrr:

library(tidyverse)

dummy_list %>%
  keep( ~ sum(.$co) != 0)

Output

$`first-group`
$`first-group`$val
 [1] 534 582 298 645 314 237 418 348 363 133 493 721 722 210 467 474 145 638 545 330 709 712 674 492 262 663 609 142 428 254

$`first-group`$co
 [1] 0 0 1 1 0 1 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 1 1 1 1 0


$`third-group`
$`third-group`$val
 [1] 713 721 683 526 699 555 563 672 619 603 588 533 622 724 616 644 730 716 660 663 611 669 644 664 679 514 579 525 533 541 530 564 584 673 592 726 548 563 727
[40] 646 708 557 586 592 693 620 548 705 510 677 539 603 726 525 597 563 712

$`third-group`$co
 [1] 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


$`fourth-group`
$`fourth-group`$val
 [1] 142 317 286 174 656 299 676 206 645 755 514 424 719 741 711 552 550 372 551 520 650 503 667 162 644 595 322 247

$`fourth-group`$co
 [1] 0 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1

However, if you also want to exclude any co that have all 1s, then we can add an extra condition.

Filter(function(x) sum(x$co) !=0 & sum(x$co == 0) > 0, dummy_list)

purrr

dummy_list %>%
  keep( ~ sum(.$co) != 0 & sum(.$co == 0) > 0)