I need to know for each player who played for the New York Mets (NYM) how many consecutive years they played, starting with the year they first joined the team.
Sample Input
In the sample output below, there is an error I need fixed. Player C started playing for the Mets in 1984 (see Sample Input above), but his first stint with the team ended in 1987 as he did not play for them in 1988, so though the Sample Output shows it as 5 consecutive years, it should be 4.
I would prefer a dplyr solution but am open to others.
I checked the available stackoverflow solutions, but could not find one that directly addressed my situation. If one exists, I would appreciate it if you could share its link.
Sample Output
Here is my code so far:
cym |>
filter(Team == "NYM") |>
group_by(Player) |>
count(name = "Consec_Yrs")
Dput(cym):
structure(list(Player = c("A", "A", "A", "A", "B", "B", "B",
"B", "C", "C", "C", "C", "C", "C", "C", "C", "D"), Team = c("NYM",
"NYM", "LAD", "LAD", "SFG", "NYM", "NYM", "NYM", "PHA", "NYM",
"NYM", "NYM", "NYM", "ATL", "ATL", "NYM", "NYM"), Year = c(1970,
1971, 1972, 1973, 1980, 1980, 1981, 1982, 1983, 1984, 1985, 1986,
1987, 1987, 1988, 1989, 2000)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -17L))
CodePudding user response:
We may need to create a grouping based on the consecutive years
library(dplyr)
out <- cym %>%
group_by(Team, Player) %>%
mutate(grpYear = cumsum(c(TRUE, diff(Year) != 1))) %>%
group_by(grpYear, .add = TRUE) %>%
count(name = "Consec_Yrs") %>%
ungroup
-output
> out %>% filter(Team == "NYM")
# A tibble: 5 × 4
Team Player grpYear Consec_Yrs
<chr> <chr> <int> <int>
1 NYM A 1 2
2 NYM B 1 3
3 NYM C 1 4
4 NYM C 2 1
5 NYM D 1 1
If we want only a single max
consecutive count per group, then
out %>%
group_by(Team, Player) %>%
slice_max(n = 1, order_by = Consec_Yrs) %>%
ungroup
# A tibble: 8 × 4
Team Player grpYear Consec_Yrs
<chr> <chr> <int> <int>
1 ATL C 1 2
2 LAD A 1 2
3 NYM A 1 2
4 NYM B 1 3
5 NYM C 1 4
6 NYM D 1 1
7 PHA C 1 1
8 SFG B 1 1
CodePudding user response:
Another possible solution:
library(dplyr)
df %>%
group_by(Player, Team) %>%
mutate(aux = cumsum(c(1, diff(Year) == 1))) %>%
summarise(Consec_yrs = max(aux), .groups = "drop")
#> # A tibble: 8 × 3
#> Player Team Consec_yrs
#> <chr> <chr> <dbl>
#> 1 A LAD 2
#> 2 A NYM 2
#> 3 B NYM 3
#> 4 B SFG 1
#> 5 C ATL 2
#> 6 C NYM 4
#> 7 C PHA 1
#> 8 D NYM 1