number
1 1
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 4
10 4
11 5
12 5
13 5
14 5
15 5
How would I go about dropping all rows where each unique value in number
has less than 5 rows with that value? For example, the tibble above would become:
number
1 5
2 5
3 5
4 5
5 5
If I wanted to drop all rows where the unique value in number
has less than 4 rows with that value, the tibble would become:
number
1 4
2 4
3 4
4 4
5 5
6 5
7 5
8 5
9 5
I've heard I could use a count variable for the number of rows for each value in numbers
and then filtering, but I'm not sure how to code this.
CodePudding user response:
Perhaps using functions from the dplyr package:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- read.table(text = " number
1 1
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 4
10 4
11 5
12 5
13 5
14 5
15 5", header = TRUE)
df %>%
group_by(number) %>%
filter(n() >= 5)
#> # A tibble: 5 × 1
#> # Groups: number [1]
#> number
#> <int>
#> 1 5
#> 2 5
#> 3 5
#> 4 5
#> 5 5
If you want to drop all rows where the unique value in number
has less than 4 rows:
df %>%
group_by(number) %>%
filter(n() >= 4)
#> # A tibble: 9 × 1
#> # Groups: number [2]
#> number
#> <int>
#> 1 4
#> 2 4
#> 3 4
#> 4 4
#> 5 5
#> 6 5
#> 7 5
#> 8 5
#> 9 5
Created on 2022-10-17 by the reprex package (v2.0.1)
CodePudding user response:
Group by the specific column, and then add a column for the number of rows per group, finally filter the desired rows out
library(dplyr)
df2 <- df %>%
group_by(number) %>%
mutate(groupCount = n()) %>%
filter(groupCount > 4)
CodePudding user response:
x <- rep(1:5, 1:5)
fltr <- data.table::rleid(x)
x[fltr >= 5]
#> [1] 5 5 5 5 5
Created on 2022-10-17 with reprex v2.0.2