Home > Blockchain >  Which function do I need to apply to get tail of 20 for "a" group not "b" group
Which function do I need to apply to get tail of 20 for "a" group not "b" group

Time:03-18

I have modified my data frame into group "a" and "b", but currently I want to get all the tail of 20 for group a and not group b. Here is the sample data:

#>      id time status displacement group
#> 1    15   1     2       3.4        a
#> 2    15   1     2       3.4        a
#> 3    15   1     2       3.4        a
#> 4    15   1     2       3.4        a
#> 5    15   1     2       3.4        a
#> 6    15   1     2       3.4        a
#> 7    15   1     2       3.4        a
#> 8    15   1     2       3.4        a
#> 9    15   1     2       3.4        b
#> 10   15   1     2       3.4        b
#> 11   15   1     2       3.4        b
#> 12   15   1     2       3.4        b
#> 13   15   1     2       3.4        b
#> 14   15   1     2       3.4        a
#> 15   15   1     2       3.4        a
#> 16   15   1     2       3.4        a
#> 17   15   1     2       3.4        a
#> 18   15   1     2       3.4        a
#> 19   15   1     2       3.4        a
#> 20   15   1     2       3.4        a
#> 21   15   1     2       3.4        a
#> 22   15   1     2       3.4        a
#> 23   15   1     2       3.4        a
#> 24   15   1     2       3.4        a
#> 25   15   1     2       3.4        a
#> 26   15   1     2       3.4        b
#> 27   15   1     2       3.4        b
#> 28   15   1     2       3.4        b
#> 29   15   1     2       3.4        b
#> 30   15   1     2       3.4        b
and so on with this pattern

I only want to get the tail of each group a of tail rows (say 5 rows), and group b remain the same.

Desire output:

#>    id time status displacement group
#> 4  15   1     2       3.4        a
#> 5  15   1     2       3.4        a
#> 6  15   1     2       3.4        a
#> 7  15   1     2       3.4        a
#> 8  15   1     2       3.4        a
#> 9  15   1     2       3.4        b
#> 10 15   1     2       3.4        b
#> 11 15   1     2       3.4        b
#> 12 15   1     2       3.4        b
#> 13 15   1     2       3.4        b
#> 14 15   1     2       3.4        a
#> 15 15   1     2       3.4        a
#> 16 15   1     2       3.4        a
#> 17 15   1     2       3.4        a
#> 18 15   1     2       3.4        a
#> 19 15   1     2       3.4        b
#> 20 15   1     2       3.4        b
#> 21 15   1     2       3.4        b
#> 22 15   1     2       3.4        b
#> 23 15   1     2       3.4        b
and so on with this pattern

I know I have to use the group_by function to group all the same group together However, I know if I group them up and tail the number, it will apply all the group

How can I achieve it? Thanks

CodePudding user response:

If we assume your data is stored in a data.frame called dt:

a_rows <- which(dt$group == "a")
b_rows <- which(dt$group == "b")

rows <- sort(c(a_rows[(length(a_rows) - 5):length(a_rows)], b_rows)

dt[rows, ]

CodePudding user response:

You may subtract the cumsum of which are in group a from the total sum and compare the result with the desired tail length atail (in example obviously 7) to create boolean subset.

atail <- 7
dat[with(dat, sum(group == 'a') - cumsum(group == 'a')   1) <= atail |
      dat$group == 'b', ]
#    id time status displacement group
# 9  15    1      2          3.4     b
# 10 15    1      2          3.4     b
# 11 15    1      2          3.4     b
# 12 15    1      2          3.4     b
# 13 15    1      2          3.4     b
# 19 15    1      2          3.4     a
# 20 15    1      2          3.4     a
# 21 15    1      2          3.4     a
# 22 15    1      2          3.4     a
# 23 15    1      2          3.4     a
# 24 15    1      2          3.4     a
# 25 15    1      2          3.4     a
# 26 15    1      2          3.4     b
# 27 15    1      2          3.4     b
# 28 15    1      2          3.4     b
# 29 15    1      2          3.4     b
# 30 15    1      2          3.4     b

Data:

dat <- structure(list(id = c(15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L), time = c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), status = c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), displacement = c(3.4, 
3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 
3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 
3.4, 3.4, 3.4), group = c("a", "a", "a", "a", "a", "a", "a", 
"a", "b", "b", "b", "b", "b", "a", "a", "a", "a", "a", "a", "a", 
"a", "a", "a", "a", "a", "b", "b", "b", "b", "b")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30"))
  • Related