Home > Enterprise >  How to find previous value in dataframe column based on another columns value
How to find previous value in dataframe column based on another columns value

Time:09-20

I have a data frame containing football matches with columns for:

  • Matchday (date)
  • Home Team
  • Away Team

I want to calculate how long it has been since a team last played. That means I have to find the latest appearance of Team and get the date, i.e. Matchday.

I really have no idea how to do it, except maybe converting the structure to a long format. But as for matching based on team and grabbing the matchday I am lost.

Current structure:

Matchday Team Home_Away
2022-08-05 Arsenal FC Away
2022-08-05 Crystal Palace Home
... ... ...
2022-08-13 Arsenal FC Home
2022-08-13 Leicester City Away 

So I want a column indicating rest days basically:

Matchday Team Home_Away Rest_Days
2022-08-05 Arsenal FC Away NA
2022-08-05 Crystal Palace Home NA
... ... ... ...
2022-08-07 Leicester City Home NA
... ... ... ...
2022-08-13 Arsenal FC Home 8
2022-08-13 Leicester City Away  6

CodePudding user response:

You need to sort the dataframe by team and date (date in ascending order). Then you create a column with the lagged value of date within the Team. For instance with dplyr:

library(dplyr)
data <- 
    data %>%
    group_by(Team) %>%
    mutate(rest_days = Matchday - dplyr::lag(Matchday, n = 1, default = NA))

I could not check if that works exactly in that manner, but the idea (sort, group and then lag) should work..

  • Related