Home > Net >  How to filter out all data frames (eg. list elements) which don't have a single value of "
How to filter out all data frames (eg. list elements) which don't have a single value of "

Time:03-31

I have a list of few thousand data frames, and about ~95% of those data frames have the required information for me to run a specific script for them. However, in my list of data frames, there are a dozen data frames that are missing this one specific value in the column Z for me to run a script for them, so I want to filter those data frames out of the list completely. Here's a quick example of what I mean:

> head(list_of_df[[1]])
# A tibble: 6 × 4
    A     B    Z                                                                        id
  <dbl> <dbl> <chr>                                                                   <int>
1  27.3 0.485 "{\"type\":\"M\",\"msg\":\"VALUE0\",\}"                                     1
2  27.4 0.457  NA                                                                         1
3  27.5 0.430  NA                                                                         1
4  27.6 0.402  NA                                                                         1
5  27.7 0.374  "{\"type\":\"M\",\"msg\":\"VALUE1\",\}"                                    1
6  27.8 0.347  NA                                                                         1

The above minimal datasheet has "VALUE1" at some point in the Z column, so this data frame is OK. However if there would not be any instances of "VALUE1" in the Z column in a data frame X, then that data frame X would be filtered out. How can I do this in R?

As a bonus question, how could I filter out all the data frames from a list of data frames, that don't have the matching number of rows with "VALUE0" and "VALUE1" in the Z column..?

CodePudding user response:

This should work:

list_of_df <- list_of_df[which(lapply(list_of_df, function(x) nrow(x[grepl("VALUE1", Z)])) > 0)]

CodePudding user response:

Here's a way of doing it using purrr::keep:

(This uses simulated data where the column to check is a and the value to test for is "D")

library(tidyverse)

list_of_dfs <- map(1:20, ~tibble(a = sample(LETTERS, 10),
       b = sample(1:100, 10)))

list_of_dfs %>% 
  keep(~ any(str_detect(.x$a, "D")))

#> [[1]]
#> # A tibble: 10 x 2
#>    a         b
#>    <chr> <int>
#>  1 X        15
#>  2 W        36
#>  3 L        69
#>  4 D        63
#>  5 A        23
#>  6 P        72
#>  7 S        30
#>  8 Q        33
#>  9 B        92
#> 10 C        37
#> 
#> [[2]]
#> # A tibble: 10 x 2
#>    a         b
#>    <chr> <int>
#>  1 O        28
#>  2 W        85
#>  3 H        15
#>  4 D        53
#>  5 Y        77
#>  6 S        16
#>  7 C        46
#>  8 E        12
#>  9 F        11
#> 10 M        74
#> 
#> ... etc.

Created on 2022-03-31 by the reprex package (v2.0.1)

  • Related