I have a list of few thousand data frames, and about ~95% of those data frames have the required information for me to run a specific script for them. However, in my list of data frames, there are a dozen data frames that are missing this one specific value in the column Z for me to run a script for them, so I want to filter those data frames out of the list completely. Here's a quick example of what I mean:
> head(list_of_df[[1]])
# A tibble: 6 × 4
A B Z id
<dbl> <dbl> <chr> <int>
1 27.3 0.485 "{\"type\":\"M\",\"msg\":\"VALUE0\",\}" 1
2 27.4 0.457 NA 1
3 27.5 0.430 NA 1
4 27.6 0.402 NA 1
5 27.7 0.374 "{\"type\":\"M\",\"msg\":\"VALUE1\",\}" 1
6 27.8 0.347 NA 1
The above minimal datasheet has "VALUE1" at some point in the Z column, so this data frame is OK. However if there would not be any instances of "VALUE1" in the Z column in a data frame X, then that data frame X would be filtered out. How can I do this in R?
As a bonus question, how could I filter out all the data frames from a list of data frames, that don't have the matching number of rows with "VALUE0" and "VALUE1" in the Z column..?
CodePudding user response:
This should work:
list_of_df <- list_of_df[which(lapply(list_of_df, function(x) nrow(x[grepl("VALUE1", Z)])) > 0)]
CodePudding user response:
Here's a way of doing it using purrr::keep
:
(This uses simulated data where the column to check is a
and the value to test for is "D")
library(tidyverse)
list_of_dfs <- map(1:20, ~tibble(a = sample(LETTERS, 10),
b = sample(1:100, 10)))
list_of_dfs %>%
keep(~ any(str_detect(.x$a, "D")))
#> [[1]]
#> # A tibble: 10 x 2
#> a b
#> <chr> <int>
#> 1 X 15
#> 2 W 36
#> 3 L 69
#> 4 D 63
#> 5 A 23
#> 6 P 72
#> 7 S 30
#> 8 Q 33
#> 9 B 92
#> 10 C 37
#>
#> [[2]]
#> # A tibble: 10 x 2
#> a b
#> <chr> <int>
#> 1 O 28
#> 2 W 85
#> 3 H 15
#> 4 D 53
#> 5 Y 77
#> 6 S 16
#> 7 C 46
#> 8 E 12
#> 9 F 11
#> 10 M 74
#>
#> ... etc.
Created on 2022-03-31 by the reprex package (v2.0.1)