I have modified my data frame into group "a" and "b", but currently I want to get all the tail of 20 for group a and not group b. Here is the sample data:
#> id time status displacement group
#> 1 15 1 2 3.4 a
#> 2 15 1 2 3.4 a
#> 3 15 1 2 3.4 a
#> 4 15 1 2 3.4 a
#> 5 15 1 2 3.4 a
#> 6 15 1 2 3.4 a
#> 7 15 1 2 3.4 a
#> 8 15 1 2 3.4 a
#> 9 15 1 2 3.4 b
#> 10 15 1 2 3.4 b
#> 11 15 1 2 3.4 b
#> 12 15 1 2 3.4 b
#> 13 15 1 2 3.4 b
#> 14 15 1 2 3.4 a
#> 15 15 1 2 3.4 a
#> 16 15 1 2 3.4 a
#> 17 15 1 2 3.4 a
#> 18 15 1 2 3.4 a
#> 19 15 1 2 3.4 a
#> 20 15 1 2 3.4 a
#> 21 15 1 2 3.4 a
#> 22 15 1 2 3.4 a
#> 23 15 1 2 3.4 a
#> 24 15 1 2 3.4 a
#> 25 15 1 2 3.4 a
#> 26 15 1 2 3.4 b
#> 27 15 1 2 3.4 b
#> 28 15 1 2 3.4 b
#> 29 15 1 2 3.4 b
#> 30 15 1 2 3.4 b
and so on with this pattern
I only want to get the tail of each group a of tail rows (say 5 rows), and group b remain the same.
Desire output:
#> id time status displacement group
#> 4 15 1 2 3.4 a
#> 5 15 1 2 3.4 a
#> 6 15 1 2 3.4 a
#> 7 15 1 2 3.4 a
#> 8 15 1 2 3.4 a
#> 9 15 1 2 3.4 b
#> 10 15 1 2 3.4 b
#> 11 15 1 2 3.4 b
#> 12 15 1 2 3.4 b
#> 13 15 1 2 3.4 b
#> 14 15 1 2 3.4 a
#> 15 15 1 2 3.4 a
#> 16 15 1 2 3.4 a
#> 17 15 1 2 3.4 a
#> 18 15 1 2 3.4 a
#> 19 15 1 2 3.4 b
#> 20 15 1 2 3.4 b
#> 21 15 1 2 3.4 b
#> 22 15 1 2 3.4 b
#> 23 15 1 2 3.4 b
and so on with this pattern
I know I have to use the group_by function to group all the same group together However, I know if I group them up and tail the number, it will apply all the group
How can I achieve it? Thanks
CodePudding user response:
If we assume your data is stored in a data.frame called dt
:
a_rows <- which(dt$group == "a")
b_rows <- which(dt$group == "b")
rows <- sort(c(a_rows[(length(a_rows) - 5):length(a_rows)], b_rows)
dt[rows, ]
CodePudding user response:
You may subtract the cumsum
of which are in group a from the total sum and compare the result with the desired tail length atail
(in example obviously 7
) to create boolean subset.
atail <- 7
dat[with(dat, sum(group == 'a') - cumsum(group == 'a') 1) <= atail |
dat$group == 'b', ]
# id time status displacement group
# 9 15 1 2 3.4 b
# 10 15 1 2 3.4 b
# 11 15 1 2 3.4 b
# 12 15 1 2 3.4 b
# 13 15 1 2 3.4 b
# 19 15 1 2 3.4 a
# 20 15 1 2 3.4 a
# 21 15 1 2 3.4 a
# 22 15 1 2 3.4 a
# 23 15 1 2 3.4 a
# 24 15 1 2 3.4 a
# 25 15 1 2 3.4 a
# 26 15 1 2 3.4 b
# 27 15 1 2 3.4 b
# 28 15 1 2 3.4 b
# 29 15 1 2 3.4 b
# 30 15 1 2 3.4 b
Data:
dat <- structure(list(id = c(15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L), time = c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), status = c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), displacement = c(3.4,
3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4,
3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4,
3.4, 3.4, 3.4), group = c("a", "a", "a", "a", "a", "a", "a",
"a", "b", "b", "b", "b", "b", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "b", "b", "b", "b", "b")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30"))