Home > Software design >  Subdividing/filtering a dataframe with two row-wise parameters in list
Subdividing/filtering a dataframe with two row-wise parameters in list

Time:10-16

I need to filter/subdivide a dataframe considering two parameters (for exclusion) that are in a dbl_df list (39 obs; below are the first ones to exemplify).

parameters <- structure(list(V1 = c("A", "A", "B", "C"), V2 = c(2018, 2019, 
2020, 2019)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", 
"data.frame"))

Both V1 and V2 are contained in the df to be filtered.

df <- structure(list(V_L = c("AS", "AS", "EU", "AF", "AS", "AM", "EU", 
"EU"), V1 = c("A", "A", "B", "C", "A", "D", "B", "C"), V2 = c(2018, 
2019, 2020, 2019, 2021, 2019, 2019, 2020), value = c(12, 13, 
14, 11, 8, 16, 14, 20)), row.names = c(NA, -8L), spec = structure(list(
    cols = list(V_L = structure(list(), class = c("collector_character", 
    "collector")), V1 = structure(list(), class = c("collector_character", 
    "collector")), V2 = structure(list(), class = c("collector_double", 
    "collector")), value = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), delim = "\t"), class = "col_spec"), problems = <pointer: 0x000001f92fdddf90>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

df_filtered:

enter image description here

I believe I have come close with help in other questions here on the forum, however I believe with the map_df function I am looking at the end of a bind rows.

map_df(parameters, ~ df %>% filter(!V1 == .x[2] & !V2 == .x[2]))

Can anyone troubleshoot or suggest another more operational proposal.

CodePudding user response:

Loading data (the given df didn't load for me without tweaking)

parameters <- structure(list(V1 = c("A", "A", "B", "C"), V2 = c(2018, 2019, 
2020, 2019)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", 
"data.frame"))

df <- structure(list(V_L = c("AS", "AS", "EU", "AF", "AS", "AM", "EU", 
"EU"), V1 = c("A", "A", "B", "C", "A", "D", "B", "C"), V2 = c(2018, 
2019, 2020, 2019, 2021, 2019, 2019, 2020), value = c(12, 13, 
14, 11, 8, 16, 14, 20)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -8L))

I'd suggest using an anti join, which will return all rows from df which do not match to a row of parameters.

dplyr::anti_join(df, parameters)

Joining, by = c("V1", "V2")
# A tibble: 4 × 4
  V_L   V1       V2 value
  <chr> <chr> <dbl> <dbl>
1 AS    A      2021     8
2 AM    D      2019    16
3 EU    B      2019    14
4 EU    C      2020    20
  • Related