Home > Net >  Filtering out data frame from list
Filtering out data frame from list

Time:11-22

I had a large data frame that I grouped and then split into a list of over 400 vectors. There are some tibbles within this data frame that have one column with only 0's as entries and I would like to somehow remove these entries from list or data frame.

A smaller sample of what my data looks like can be seen here:

 dfa <- data.frame(intensity.x = c(10, 20, 100, 30 , 40), intensity.y = c(100, 30, 0.0, 20, 0), group = c('a', 'a', 'a', 'a', 'a'))
dfb <- data.frame(intensity.x = c(100, 10, 45, 60 , 43), intensity.y = c(0, 0, 0, 0, 0), group = c('b', 'b', 'b', 'b', 'b'))
dfx <- data.frame(intensity.x = c(20, 4, 5, 16 , 3), intensity.y = c(0, 12, 0, 1, 0), group = c('x', 'x', 'x', 'x', 'x'))
dfy <- data.frame(intensity.x = c(10, 10, 30, 20 , 80), intensity.y = c(0, 0, 0, 0, 0), group = c('y', 'y', 'y', 'y', 'y'))
df.big <- rbind(dfa, dfb, dfx, dfy)
df.list <- list(dfa, dfb, dfx, dfy)

Essentially I want groups like dfy and dfb to be filtered out of my large data frame (df.big) or the kist (df.list) because all of their intensity.y values are 0, but I can't use

filter(df.big$intensity.y != 0)

Because that would then remove the values from groups df and dfz which I want to maintain.

Is this possible?

CodePudding user response:

Yes, you can do:

df.list[sapply(df.list, function(df) !all(df$intensity.y == 0))]
#> [[1]]
#>   intensity.x intensity.y group
#> 1          10         100     a
#> 2          20          30     a
#> 3         100           0     a
#> 4          30          20     a
#> 5          40           0     a
#> 
#> [[2]]
#>   intensity.x intensity.y group
#> 1          20           0     x
#> 2           4          12     x
#> 3           5           0     x
#> 4          16           1     x
#> 5           3           0     x

Created on 2022-11-21 with reprex v2.0.2

CodePudding user response:

Using base R with Filter

Filter(\(x) any(x$intensity.y != 0), df.list)
[[1]]
  intensity.x intensity.y group
1          10         100     a
2          20          30     a
3         100           0     a
4          30          20     a
5          40           0     a

[[2]]
  intensity.x intensity.y group
1          20           0     x
2           4          12     x
3           5           0     x
4          16           1     x
5           3           0     x

CodePudding user response:

Alternative using purrr:

df.list |> purrr::keep(~dplyr::summarise(.x, sum(intensity.y)) != 0)
  •  Tags:  
  • r
  • Related