Home > OS >  Select last values based on rows below in another column in R
Select last values based on rows below in another column in R

Time:06-28

I've been working on a dataset with rolling subs for a netball game. I'm trying to find the last 5 players to be left on the bench, but there is another column where players swap on. This causes issues as a player could sub off in the last 5 rows then sub back on later, so I can't simply use tail(5). So if the player swaps back on, then effectively she isn't the last on the bench. Here's a dataframe below, with the last 8 values.

sub_off sub_on time
Alexia Jorja 750
Amy Sophie 770
Jorja Cassandra 800
Lucy Jane 820
Claire Khloe 820
Cassie Alexia 850
Khloe Claire 890
Georgia Khloe 895

Notice I want to pick the 5 players that remain on the bench. These would be Georgia, Cassie, Lucy, Jorja and Amy. This is because Khloe, Claire and Alexia sub back on later in the game.

In other words, I want to pick the last 5 values in the sub_off column that don't appear in the rows following for the sub_on column.

df <- data.frame(
  sub_off = c('Alexia', 'Amy', 'Jorja', 'Lucy', 'Claire', 'Cassie', 'Khloe', 'Georgia'),
  sub_on = c('Jorja', 'Sophie', 'Cassandra', 'Jane', 'Khloe', 'Alexia', 'Claire', 'Khloe'),
  time = c(750, 770, 800, 820, 820, 850, 890, 895)
)
df

Here's the dataframe, any help would be appreciated. Thanks!

CodePudding user response:

A data.table option using an anti-join between sub_off and sub_on where the time is later:

library(data.table)
setDT(df)
df[!df, on=c("sub_off"="sub_on","time<time"), .(bench=sub_off)]
#     bench
#1:     Amy
#2:   Jorja
#3:    Lucy
#4:  Cassie
#5: Georgia
  • Related