I've been working on a dataset with rolling subs for a netball game. I'm trying to find the last 5 players to be left on the bench, but there is another column where players swap on. This causes issues as a player could sub off in the last 5 rows then sub back on later, so I can't simply use tail(5). So if the player swaps back on, then effectively she isn't the last on the bench. Here's a dataframe below, with the last 8 values.
sub_off | sub_on | time |
---|---|---|
Alexia | Jorja | 750 |
Amy | Sophie | 770 |
Jorja | Cassandra | 800 |
Lucy | Jane | 820 |
Claire | Khloe | 820 |
Cassie | Alexia | 850 |
Khloe | Claire | 890 |
Georgia | Khloe | 895 |
Notice I want to pick the 5 players that remain on the bench. These would be Georgia, Cassie, Lucy, Jorja and Amy. This is because Khloe, Claire and Alexia sub back on later in the game.
In other words, I want to pick the last 5 values in the sub_off column that don't appear in the rows following for the sub_on column.
df <- data.frame(
sub_off = c('Alexia', 'Amy', 'Jorja', 'Lucy', 'Claire', 'Cassie', 'Khloe', 'Georgia'),
sub_on = c('Jorja', 'Sophie', 'Cassandra', 'Jane', 'Khloe', 'Alexia', 'Claire', 'Khloe'),
time = c(750, 770, 800, 820, 820, 850, 890, 895)
)
df
Here's the dataframe, any help would be appreciated. Thanks!
CodePudding user response:
A data.table option using an anti-join between sub_off
and sub_on
where the time
is later:
library(data.table)
setDT(df)
df[!df, on=c("sub_off"="sub_on","time<time"), .(bench=sub_off)]
# bench
#1: Amy
#2: Jorja
#3: Lucy
#4: Cassie
#5: Georgia