I'm trying to do a year-to-year comparison for each player in a dataframe so I can plot Year 1 vs Year 2 in a scatterplot on the x and y axis' respectively. I think I need to use a for loop with a nested if_else statement but I feel like that may be overcomplicating things.
Here is what I have:
| player |season|average|
| ------ | ---- |-------|
| A.Lee | 2019 | 45 |
| A.Lee | 2020 | 40 |
| A.Lee | 2021 | 46 |
| A.Cole | 2020 | 39 |
| A.Cole | 2021 | 42 |
Here are the desired results:
| player | Y1 | Y2 |
| ------ | ---- | ------- |
| A.Lee | 45 | 40 |
| A.Lee | 40 | 46 |
| A.Cole | 39 | 42 |
CodePudding user response:
Grouped by 'player', get the lead
values of 'average' as 'Y1' and 'average' as 'Y2' within summarise
and remove the NA
elements (by default the default
option in lead/lag
are NA)
library(dplyr)
df %>%
arrange(player, desc(season)) %>%
group_by(player) %>%
summarise(Y1 = lead(average), Y2 = average, .groups = 'drop') %>%
na.omit
-output
# A tibble: 3 × 3
player Y1 Y2
<chr> <int> <int>
1 A.Cole 39 42
2 A.Lee 40 46
3 A.Lee 45 40
data
df <- structure(list(player = c("A.Lee", "A.Lee", "A.Lee", "A.Cole",
"A.Cole"), season = c(2019, 2020, 2021, 2020, 2021), average = c(45L,
40L, 46L, 39L, 42L)), row.names = c(NA, -5L), class = "data.frame")
CodePudding user response:
Here is a different approach using mutate(across...
with lead
function and filter
and select
:
library(dplyr)
df %>%
mutate(across(c(season, average), lead, .names = "{.col}1")) %>%
mutate(diff = abs(season1 - season), .keep="unused") %>%
filter(diff ==1) %>%
dplyr::select(player, Y1 = average, Y2 = average1)
player Y1 Y2
1 A.Lee 45 40
2 A.Lee 40 46
3 A.Cole 39 42