I have a data frame that looks like this:
spp year month count
1 2020 2 2
1 2020 2 3
1 2020 5 4
1 2020 5 3
1 2021 2 2
1 2021 2 4
2 2020 2 2
2 2020 2 6
2 2020 5 3
3 2021 2 4
3 2021 2 4
4 2020 2 3
4 2020 2 6
4 2020 5 5
4 2020 5 7
I only want to keep the species that 1) have at least two observations per month and 2) have observations in at least two different months. I want to end up with something like this:
spp year month count
1 2020 2 2
1 2020 2 3
1 2020 5 4
1 2020 5 3
1 2021 2 2
1 2021 2 4
4 2020 2 3
4 2020 2 6
4 2020 5 5
4 2020 5 7
I'm only working with two months in 2020 (2 and 5) and one month in 2021 (2). I think filter
from the dplyr
package might work but I have no idea how to go on about it.
Thanks in advance.
CodePudding user response:
You can use the following code with two filter
s and two group_by
:
library(dplyr)
df %>%
group_by(spp, month) %>%
filter(n() >= 2) %>%
group_by(spp) %>%
filter(n_distinct(month) >= 2) %>%
ungroup()
#> # A tibble: 10 × 4
#> spp year month count
#> <int> <int> <int> <int>
#> 1 1 2020 2 2
#> 2 1 2020 2 3
#> 3 1 2020 5 4
#> 4 1 2020 5 3
#> 5 1 2021 2 2
#> 6 1 2021 2 4
#> 7 4 2020 2 3
#> 8 4 2020 2 6
#> 9 4 2020 5 5
#> 10 4 2020 5 7
Created on 2022-08-27 with reprex v2.0.2
CodePudding user response:
We may also do
library(dplyr)
df1 %>%
group_by(spp) %>%
add_count(month) %>%
filter(n>=2, n_distinct(month[n >=2]) >=2 ) %>%
ungroup %>%
select(-n)
-output
# A tibble: 10 × 4
spp year month count
<int> <int> <int> <int>
1 1 2020 2 2
2 1 2020 2 3
3 1 2020 5 4
4 1 2020 5 3
5 1 2021 2 2
6 1 2021 2 4
7 4 2020 2 3
8 4 2020 2 6
9 4 2020 5 5
10 4 2020 5 7