Home > OS >  In R need count of consecutive years played
In R need count of consecutive years played

Time:02-23

I need to know for each player who played for the New York Mets (NYM) how many consecutive years they played, starting with the year they first joined the team.

Sample Input

Sample Input

In the sample output below, there is an error I need fixed. Player C started playing for the Mets in 1984 (see Sample Input above), but his first stint with the team ended in 1987 as he did not play for them in 1988, so though the Sample Output shows it as 5 consecutive years, it should be 4.

I would prefer a dplyr solution but am open to others.

I checked the available stackoverflow solutions, but could not find one that directly addressed my situation. If one exists, I would appreciate it if you could share its link.

Sample Output

Sample Output

Here is my code so far:

cym |>
  filter(Team == "NYM") |>
  group_by(Player) |>
  count(name = "Consec_Yrs")

Dput(cym):

structure(list(Player = c("A", "A", "A", "A", "B", "B", "B", 
"B", "C", "C", "C", "C", "C", "C", "C", "C", "D"), Team = c("NYM", 
"NYM", "LAD", "LAD", "SFG", "NYM", "NYM", "NYM", "PHA", "NYM", 
"NYM", "NYM", "NYM", "ATL", "ATL", "NYM", "NYM"), Year = c(1970, 
1971, 1972, 1973, 1980, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 
1987, 1987, 1988, 1989, 2000)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -17L))

CodePudding user response:

We may need to create a grouping based on the consecutive years

library(dplyr)
out <- cym %>%
   group_by(Team, Player) %>%
   mutate(grpYear = cumsum(c(TRUE, diff(Year) != 1))) %>% 
   group_by(grpYear, .add = TRUE)  %>%
   count(name = "Consec_Yrs") %>%
   ungroup

-output

> out %>% filter(Team == "NYM")
# A tibble: 5 × 4
  Team  Player grpYear Consec_Yrs
  <chr> <chr>    <int>      <int>
1 NYM   A            1          2
2 NYM   B            1          3
3 NYM   C            1          4
4 NYM   C            2          1
5 NYM   D            1          1

If we want only a single max consecutive count per group, then

out %>% 
  group_by(Team, Player) %>%
  slice_max(n = 1, order_by = Consec_Yrs) %>%
  ungroup
# A tibble: 8 × 4
  Team  Player grpYear Consec_Yrs
  <chr> <chr>    <int>      <int>
1 ATL   C            1          2
2 LAD   A            1          2
3 NYM   A            1          2
4 NYM   B            1          3
5 NYM   C            1          4
6 NYM   D            1          1
7 PHA   C            1          1
8 SFG   B            1          1

CodePudding user response:

Another possible solution:

library(dplyr)

df %>% 
  group_by(Player, Team) %>% 
  mutate(aux = cumsum(c(1, diff(Year) == 1))) %>% 
  summarise(Consec_yrs = max(aux), .groups = "drop")

#> # A tibble: 8 × 3
#>   Player Team  Consec_yrs
#>   <chr>  <chr>      <dbl>
#> 1 A      LAD            2
#> 2 A      NYM            2
#> 3 B      NYM            3
#> 4 B      SFG            1
#> 5 C      ATL            2
#> 6 C      NYM            4
#> 7 C      PHA            1
#> 8 D      NYM            1
  • Related