I have data for formula-1 drivers in 3 columns and want to make a time series plot of the cumulative points for every driver.
Problem is: All my drivers are in the first column, the points in the second and the cumulative sum in the third column.
testdf <- c("Driver A", "Driver A", "Driver A", "Driver B", "Driver B", "Driver B")
values <- c(1,5,7,3,5,8)
driversmatrix <- cbind(testdf, values); driversmatrix
Link to picture of View of dataframe
How could I make a time series out of this where every drivers cumulative points are plotted against each other?
CodePudding user response:
library(data.table)
# set as data table if yours isn't one already
setDT(df)
# dummy data
df <- data.table(driver = c("Driver A", "Driver A", "Driver A", "Driver B", "Driver B", "Driver B")
, points = c(1,5,7,3,5,8)
); df
# calculate cumulative sum and date (assumes data sorted in ascending date already)
df[, `:=` (cum_sum = cumsum(points)
, date = 1:.N
)
, driver
]
# plot
ggplot(data=df, aes(x=date, y=cum_sum, group=driver))
geom_line(aes(linetype=driver))
geom_point()
Notice, plotting one line per driver as we are currently doing may not be optimum if we have many drivers (cluttered plot)
CodePudding user response:
First you would need to have a column that that indicates a race number or date, assuming that your data has the same number of races per driver:
library(tidyverse)
testdf <- data.frame(Driver= c("Driver A", "Driver A", "Driver A", "Driver B", "Driver B", "Driver B") , Points=c(1,5,7,3,5,8))
testdf <- testdf %>% group_by(Driver) %>% mutate(Cum_Points=cumsum(Points), Race_No=row_number())
Then plot cumulative points against the race number with driver as the colour variable
ggplot(testdf, aes(Race_No, Cum_Points, colour=Driver)) geom_line()