Home > Software design >  How to get keep groups in R which meet a certain conditions using dplyr
How to get keep groups in R which meet a certain conditions using dplyr

Time:09-19

I have the following data:

structure(list(ISIN = c("AEA000201011", "AEA000201011", "AEA000201011", 
"AEA000201011", "AEA000201011", "AEA000201011", "AEA000201011", 
"AEA000201011", "AEA000201011", "AEA000201011", "AEA000801018", 
"AEA000801018", "AEA000801018", "AEA000801018", "AEA000801018", 
"AEA000801018", "AEA000801018", "AEA000801018", "AEA000801018", 
"AEA000801018", "AEA001501013", "AEA001501013", "AEA001501013", 
"AEA001501013", "AEA001501013", "AEA001501013", "AEA001501013", 
"AEA001501013", "AEA001501013", "AEA001501013"), year = c(2010L, 
2011L, 2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2018L, 2019L, 
2010L, 2011L, 2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2018L, 
2019L, 2010L, 2011L, 2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 
2018L, 2019L), refin_esg = c(NA, NA, NA, NA, 48.74, 54.32, 63.81, 
67.64, 71.74, 68.17, NA, NA, NA, NA, NA, NA, NA, NA, NA, 30.89, 
NA, NA, NA, NA, 13.47, 14.63, 19.68, 17.51, 13.92, 15.15), refin_e = c(NA, 
NA, NA, NA, 23.21, 53.02, 60.18, 18.64, 50.25, 52.21, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, 20.52, NA, NA, NA, NA, 0, 0, 0, 0, 
0, 0), mktcap = c(2723907, 4219977, 4576565, 9207428, 9954352, 
9329029, 9765366, 9623839, 11548606, 15002191, 1918568, 2047331, 
2047331, 3734126, 4655646, 3399201, 3277575, 3260324, 3856508, 
5329892, 641477, 647176, 957332, 2452771, 3505987, 1571026, 1646012, 
971968, 775941, 526823)), class = c("grouped_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -30L), groups = structure(list(
    ISIN = c("AEA000201011", "AEA000801018", "AEA001501013"), 
    .rows = structure(list(1:10, 11:20, 21:30), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE))

The data is grouped by ISIN. I only want to keep those ISINs which have at least three non-NA values across all years for each column. E.g., if an ISIN only has data for 2012 and 2013 for any column, and the rest of the years in that column are NA, then that ISIN gets dropped.

CodePudding user response:

Does this work?

df |> 
  group_by(ISIN) |>
  filter(min(across(everything(), \(x) sum(!is.na(x)))) > 2)
  • Related