Home > OS >  Keep levels based on multiple conditions on another column in r data frame
Keep levels based on multiple conditions on another column in r data frame

Time:08-28

I have a data frame that looks like this:

spp     year    month     count
 1      2020      2         2
 1      2020      2         3
 1      2020      5         4
 1      2020      5         3
 1      2021      2         2
 1      2021      2         4
 2      2020      2         2
 2      2020      2         6
 2      2020      5         3
 3      2021      2         4
 3      2021      2         4
 4      2020      2         3
 4      2020      2         6
 4      2020      5         5
 4      2020      5         7

I only want to keep the species that 1) have at least two observations per month and 2) have observations in at least two different months. I want to end up with something like this:

spp     year    month     count
 1      2020      2         2
 1      2020      2         3
 1      2020      5         4
 1      2020      5         3
 1      2021      2         2
 1      2021      2         4
 4      2020      2         3
 4      2020      2         6
 4      2020      5         5
 4      2020      5         7

I'm only working with two months in 2020 (2 and 5) and one month in 2021 (2). I think filter from the dplyr package might work but I have no idea how to go on about it.

Thanks in advance.

CodePudding user response:

You can use the following code with two filters and two group_by:

library(dplyr)
df %>%
  group_by(spp, month) %>%
  filter(n() >= 2) %>%
  group_by(spp) %>%
  filter(n_distinct(month) >= 2) %>%
  ungroup()
#> # A tibble: 10 × 4
#>      spp  year month count
#>    <int> <int> <int> <int>
#>  1     1  2020     2     2
#>  2     1  2020     2     3
#>  3     1  2020     5     4
#>  4     1  2020     5     3
#>  5     1  2021     2     2
#>  6     1  2021     2     4
#>  7     4  2020     2     3
#>  8     4  2020     2     6
#>  9     4  2020     5     5
#> 10     4  2020     5     7

Created on 2022-08-27 with reprex v2.0.2

CodePudding user response:

We may also do

library(dplyr)
df1 %>% 
  group_by(spp) %>%
  add_count(month) %>%  
  filter(n>=2, n_distinct(month[n >=2]) >=2 ) %>%
  ungroup %>% 
  select(-n)

-output

# A tibble: 10 × 4
     spp  year month count
   <int> <int> <int> <int>
 1     1  2020     2     2
 2     1  2020     2     3
 3     1  2020     5     4
 4     1  2020     5     3
 5     1  2021     2     2
 6     1  2021     2     4
 7     4  2020     2     3
 8     4  2020     2     6
 9     4  2020     5     5
10     4  2020     5     7
  • Related