I am trying to compute stats for basketball lineups given the play-by-play data. In particular I have a dataframe which more or less looks like this
game_id | action_id | team_id | event | p1 | p2 | p3 | p4 | p5 |
---|---|---|---|---|---|---|---|---|
G1 | 1 | H | 3PA | A | B | C | D | E |
G1 | 2 | V | REB | A | B | C | D | E |
I would like to make a sort of pivot_longer
but instead of having a column with the names (p1:p5) and the values (A:E), I am looking for a function which allows me to find all combinations of k players ( 1 <= k <= 5 ). For instance, if k = 2, it would be something like:
game_id | action_id | team_id | event | p_comb1 | p_comb2 |
---|---|---|---|---|---|
G1 | 1 | H | 3PA | A | B |
G1 | 1 | H | 3PA | A | C |
G1 | 1 | H | 3PA | A | D |
G1 | 1 | H | 3PA | A | E |
G1 | 1 | H | 3PA | B | C |
G1 | 1 | H | 3PA | B | D |
G1 | 1 | H | 3PA | B | E |
G1 | 1 | H | 3PA | C | D |
G1 | 1 | H | 3PA | C | E |
G1 | 1 | H | 3PA | D | E |
G1 | 2 | V | REB | A | B |
G1 | 2 | V | REB | A | C |
G1 | 2 | V | REB | A | D |
G1 | 2 | V | REB | A | E |
G1 | 2 | V | REB | B | C |
G1 | 2 | V | REB | B | D |
G1 | 2 | V | REB | B | E |
G1 | 2 | V | REB | C | D |
G1 | 2 | V | REB | C | E |
G1 | 2 | V | REB | D | E |
In particular my requirements would be to:
- be able to set k as a parameter of some function
- avoid manual self joins (if it's not the only possible solution)
- be able to do this operation after grouping, so I only look for combinations withing the same actions, and not all combinations in the dataset
- do all within the same pipe, avoiding
lapply
and for cycles where possible
Does a function like even this exists?
I tried with self joining, which is in fact a possible solution, even if it doesn't look very elegant.
CodePudding user response:
You could do:
df %>%
mutate(comb = apply(across(num_range('p', 1:5)), 1, function(x) as.data.frame(t(combn(x, 2))))) %>%
unnest_longer(comb) %>%
unpack(comb)
CodePudding user response:
Please check the below code
df %>% pivot_longer(starts_with('P'), names_to = 'name', values_to = 'p_comb1') %>%
group_by(game_id,action_id,team_id,event) %>%
mutate(row1=row_number(), p_comb2=p_comb1, row2=row1) %>%
complete(row1, row2) %>% filter(row2>row1) %>%
mutate(p_comb1=LETTERS[row1], p_comb2=LETTERS[row2]) %>% select(-starts_with('row'), -name)
Created on 2023-01-28 with reprex v2.0.2
# A tibble: 10 × 6
# Groups: game_id, action_id, team_id, event [1]
game_id action_id team_id event p_comb1 p_comb2
<chr> <chr> <chr> <chr> <chr> <chr>
1 G1 1 H 3PA A B
2 G1 1 H 3PA A C
3 G1 1 H 3PA A D
4 G1 1 H 3PA A E
5 G1 1 H 3PA B C
6 G1 1 H 3PA B D
7 G1 1 H 3PA B E
8 G1 1 H 3PA C D
9 G1 1 H 3PA C E
10 G1 1 H 3PA D E