I have the following dataframe:
df1 <- data.frame(ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2),
var1 = c(0, 2, 3, 4, 2, 5, 6, 10, 11, 0, 1, 2, 1, 5, 7, 10))
I want to select only the rows containing values up to 5, once 5 is reached I want it to go to the next ID and select only values up to 5 for that group so that the final result would look like this:
ID var1
1 0
1 2
1 3
1 4
1 2
1 5
2 0
2 1
2 2
2 1
2 5
I would like to try something with dplyr
as it is what I am most familiar with.
CodePudding user response:
To select rows by conditions in R, for example, conditions include equal, not equal. And also some examples to get rows based on multiple conditions. To get rows based on column value use dataframe$row
.
In you case you can do this:
df1[df1$ID <=5 & df1$var1<=5,]
Reference : How to Select Rows in R with Examples
CodePudding user response:
You could use which.max()
to find the first occurrence of var1
>= 5, and then extract those rows whose row numbers are before it.
library(dplyr)
df1 %>%
group_by(ID) %>%
filter(row_number() <= which.max(var1 >= 5)) %>%
ungroup()
or
df1 %>%
group_by(ID) %>%
slice(1:which.max(var1 >= 5)) %>%
ungroup()
# # A tibble: 11 × 2
# ID var1
# <dbl> <dbl>
# 1 1 0
# 2 1 2
# 3 1 3
# 4 1 4
# 5 1 2
# 6 1 5
# 7 2 0
# 8 2 1
# 9 2 2
# 10 2 1
# 11 2 5
CodePudding user response:
Another approach - you can use cummax
so it will filter
and keep all values until a maximum of 5 is reached. Note that this would include all of the rows containing 5.
library(dplyr)
df1 %>%
group_by(ID) %>%
filter(cummax(var1) <= 5)
Output
ID var1
<dbl> <dbl>
1 1 0
2 1 2
3 1 3
4 1 4
5 1 2
6 1 5
7 2 0
8 2 1
9 2 2
10 2 1
11 2 5