Home > database >  Finding all the combinations of k elements among n columns in tidyverse
Finding all the combinations of k elements among n columns in tidyverse

Time:01-29

I am trying to compute stats for basketball lineups given the play-by-play data. In particular I have a dataframe which more or less looks like this

game_id action_id team_id event p1 p2 p3 p4 p5
G1 1 H 3PA A B C D E
G1 2 V REB A B C D E

I would like to make a sort of pivot_longer but instead of having a column with the names (p1:p5) and the values (A:E), I am looking for a function which allows me to find all combinations of k players ( 1 <= k <= 5 ). For instance, if k = 2, it would be something like:

game_id action_id team_id event p_comb1 p_comb2
G1 1 H 3PA A B
G1 1 H 3PA A C
G1 1 H 3PA A D
G1 1 H 3PA A E
G1 1 H 3PA B C
G1 1 H 3PA B D
G1 1 H 3PA B E
G1 1 H 3PA C D
G1 1 H 3PA C E
G1 1 H 3PA D E
G1 2 V REB A B
G1 2 V REB A C
G1 2 V REB A D
G1 2 V REB A E
G1 2 V REB B C
G1 2 V REB B D
G1 2 V REB B E
G1 2 V REB C D
G1 2 V REB C E
G1 2 V REB D E

In particular my requirements would be to:

  • be able to set k as a parameter of some function
  • avoid manual self joins (if it's not the only possible solution)
  • be able to do this operation after grouping, so I only look for combinations withing the same actions, and not all combinations in the dataset
  • do all within the same pipe, avoiding lapply and for cycles where possible

Does a function like even this exists?

I tried with self joining, which is in fact a possible solution, even if it doesn't look very elegant.

CodePudding user response:

You could do:

df %>%
  mutate(comb = apply(across(num_range('p', 1:5)), 1, function(x) as.data.frame(t(combn(x, 2))))) %>%
  unnest_longer(comb) %>%
  unpack(comb)

CodePudding user response:

Please check the below code

df %>% pivot_longer(starts_with('P'), names_to = 'name', values_to = 'p_comb1') %>% 
  group_by(game_id,action_id,team_id,event) %>% 
  mutate(row1=row_number(), p_comb2=p_comb1, row2=row1) %>% 
  complete(row1, row2) %>% filter(row2>row1) %>% 
  mutate(p_comb1=LETTERS[row1], p_comb2=LETTERS[row2]) %>% select(-starts_with('row'), -name)

Created on 2023-01-28 with reprex v2.0.2

# A tibble: 10 × 6
# Groups:   game_id, action_id, team_id, event [1]
   game_id action_id team_id event p_comb1 p_comb2
   <chr>   <chr>     <chr>   <chr> <chr>   <chr>  
 1 G1      1         H       3PA   A       B      
 2 G1      1         H       3PA   A       C      
 3 G1      1         H       3PA   A       D      
 4 G1      1         H       3PA   A       E      
 5 G1      1         H       3PA   B       C      
 6 G1      1         H       3PA   B       D      
 7 G1      1         H       3PA   B       E      
 8 G1      1         H       3PA   C       D      
 9 G1      1         H       3PA   C       E      
10 G1      1         H       3PA   D       E      

  • Related