I am working with some tennis ranking data in R, that gives the evolution of tennis rankings over time of all the players on the ATP tour.
An example of the data I am using can be found here, giving the rankings data from 2000's:
CodePudding user response:
Try out roll functions from the zoo
package
library(zoo)
set.seed(2137)
my_data <- rnorm(100)
zoo::rollmean(my_data, 10)
Depending on data types and time frequency of your data it could look like this
rankings %>%
group_by(player) %>%
summarise(
meanRanking = zoo::rollmean(rank, k = 2, na.rm = TRUE),
)
CodePudding user response:
Your data are reasonably large, and data.table can help quite a bit with speed. Here is an approach that is very fast, and it uses a flexible function f(s,e,p,u)
, which allows you to pass in any start (s
) or end (e
) date, an integer period (e.g. 2 for 2 years), and time unit (u
) which takes values "y"
, "m"
, or "d"
, for years, months, days, respectively
f <- function(s,e,p, u=c("y","m","d")) {
u=match.arg(u)
uf = list("y"=years,"m"=months,"d"=days)
data.table(s = seq(as.Date(s), as.Date(e),by=paste(p,u)))[,`:=`(e=s %m % uf[[u]](p), period=1:.N)]
}
Then you just apply the function to dt in a non-equi join
dt[f("2000-01-01", "2008-01-01",2), on=.(ranking_date>=s, ranking_date<=e)] %>%
.[,.(ranking=mean(rank,na.rm=T)), by=.(player,period )]
Output:
player period ranking
1: 101736 1 2.769231
2: 102338 1 5.211538
3: 101948 1 4.730769
4: 103017 1 23.259615
5: 102856 1 2.538462
---
13543: 105996 5 1780.500000
13544: 105050 5 1665.333333
13545: 105757 5 1781.000000
13546: 121555 5 1328.500000
13547: 106018 5 1508.000000
For 3 month windows, just call f
with p
=3, and u
="m"
dt[f("2000-01-01", "2008-01-01",3, "m"), on=.(ranking_date>=s, ranking_date<=e)] %>%
.[,.(ranking=mean(rank,na.rm=T)), by=.(player,period )]
Output (for 3 month windows)
player period ranking
1: 101736 1 1.000000
2: 102338 1 2.666667
3: 101948 1 2.333333
4: 103017 1 4.000000
5: 102856 1 5.500000
---
62491: 105517 33 1502.000000
62492: 104405 33 1588.000000
62493: 104576 33 1588.000000
62494: 105500 33 1679.000000
62495: 108698 33 1844.000000