Home > OS >  removing the last entry of each ID that meet a condition
removing the last entry of each ID that meet a condition

Time:12-29

I have a dataset of behavioural events of multiple individual. It looks like:

ID  Behavior  Order
A   Bucket    1
A   Tube      2
A   Bucket    3
A   Tube      4
A   Push      5
B   Bucket    1
B   Tube      2
B   Bucket    3
B   Tube      4
C   Bucket    1
C   Tube      2
C   Push      3
C   Bucket    4
C   Tube      5

I am trying to remove the last entry of individuals where this last entry is the "Push" Behavior.

I first found which individuals' last behavior was "Push" using the following code:

LastBehaviour <- aggregate(Events[,"Behavior"], list(Events$Observation.id), tail, 1)
FixPush<-LastBehaviour[which(LastBehaviour$Behavior=="Push" ),"Group.1"]

Where I'm having trouble is how to remove the last row of the individuals based on the list in "FixPush. I tried the following:

df <- df %>%
  group_by(Observation.id%in%FixPush)%>%
  arrange(Order)%>%
  head(-1) %>%
  glimpse()

Right now this code only removes one row. I'm assuming the last row of the whole dataset. How can I remove the last entry of each individuals where the Behavior is "Push"?

CodePudding user response:

With dplyr, use a grouped filter to remove rows where Behavior is "Push" and row_number() == n() (which means it's the last row in the group).

library(dplyr)

Events %>% 
  group_by(ID) %>% 
  filter(!(Behavior == "Push" & row_number() == n())) %>% 
  ungroup()
# A tibble: 13 × 3
   ID    Behavior Order
   <chr> <chr>    <dbl>
 1 A     Bucket       1
 2 A     Tube         2
 3 A     Bucket       3
 4 A     Tube         4
 5 B     Bucket       1
 6 B     Tube         2
 7 B     Bucket       3
 8 B     Tube         4
 9 C     Bucket       1
10 C     Tube         2
11 C     Push         3
12 C     Bucket       4
13 C     Tube         5

CodePudding user response:

Another option using data.table

Where .SD is your subset of data for each group of ID. Then remove any row where Behavior == "Push" AND Order == .N. Where .N represents the total number of rows per group.

library(data.table)

# sample data
df = data.table(
  ID = c(rep("A",5),rep("B",4), rep("C",5)),
  Behavior = c("Bucket","Tube","Bucket","Tube","Push","Bucket","Tube","Bucket","Tube","Bucket","Tube","Push","Bucket","Tube")
)
df[, Order := 1:.N, by = ID]

# subsetting
df = df[, .SD[!(Behavior == "Push" & Order == .N)], by = ID]
> df
    ID Behavior Order
 1:  A   Bucket     1
 2:  A     Tube     2
 3:  A   Bucket     3
 4:  A     Tube     4
 5:  B   Bucket     1
 6:  B     Tube     2
 7:  B   Bucket     3
 8:  B     Tube     4
 9:  C   Bucket     1
10:  C     Tube     2
11:  C     Push     3
12:  C   Bucket     4
13:  C     Tube     5
  • Related