Home > Mobile >  R: Group Over Multiple Rows and Columns
R: Group Over Multiple Rows and Columns

Time:03-07

I am working with the R programming language. I have the following data set:

set.seed(123)

Game = c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,16,16,17,17,18,18,19,19,20,20)

id = c(3,4,3,4,3,4,3,4,3,4,3,4, 3,4,3,4,3,4,3,4)

c <- c("1", "2")    
coin <- sample(c, 20, replace=TRUE, prob=c(0.5,0.5))

  
winner <- c("win", "win", "lose", "lose", "tie", "tie", "lose", "lose", "win", "win", "win", "win", "lose", "lose", "tie", "tie", "lose", "lose", "win", "win", "win", "win", "lose", "lose", "tie", "tie", "lose", "lose", "win", "win", "win", "win", "lose", "lose", "tie", "tie", "lose", "lose", "win", "win")

my_data = data.frame(Game, id, coin, winner)

The data ("my_data") looks something like this:

  Game id coin winner
1    1  3    2    win
2    1  4    1    win
3    2  3    2   lose
4    2  4    1   lose
5    3  3    1    tie
6    3  4    2    tie

For this dataset ("my_data"), I want to perform the following operations:

  • For each unique value of the "Game" variable (e.g. Game = 1, Game = 2, etc.), find out the frequency for each unique "Coin" combinations. For example, perhaps Coin = (2,1) happens 5 times , Coin = (2,2) happens 11 times, etc.

  • Next, for each of these unique "Coin" combinations - find the breakdowns of "win", "lose" and "tie". For example, for Coin = (2,2) there might be 5/11 wins, 3/11 loss and 3/11 tie.

I tried to accomplish this with the following code :

Part 1: (Manually) Find out unique coin combinations per Game (e.g, 1,1 OR 1,2 OR 2,1 OR 2,2)

for (i in 1:19) {
for (j in 2:20) {

my_data$comb = ifelse(my_data[i,3] == "1" & my_data[j,3] == "1", "one,one", ifelse(my_data[i,3] == "2" & my_data[j,3] == "1", "two, one", ifelse(my_data[i,3] == "1" & my_data[j,3] == "2", "one,two", "two,two)))

}
}

Part 2: (Had this worked) Find out Win/Tie/Loss Breakdown for each unique combination from Part 1:

library(dplyr)

my_data %>% group_by(comb) %>% summarise(percent = n() )

The desired output should look something like this (note: 1,2 = 2,1):

enter image description here

Currently, I am importing "my_data" into Microsoft Excel - but can someone please show me how to do this in R?

Can someone please show me how to obtain the above table?

Thanks!

CodePudding user response:

You can use the following codes:

library(tidyverse)
ff <- my_data %>% group_by(Game) %>% arrange(Game, coin) %>% 
  do(as.data.frame(t(combn(.[["coin"]], 2)))) %>% mutate(coin = paste(V1, V2, sep = ",")) %>% select(Game, coin)

my_data <- my_data %>% select(Game, winner) %>% distinct() %>% left_join(ff)

Then your desired output could be obtained by:

my_data %>% group_by(coin, winner) %>% summarise(n = n()) %>% mutate(p = 100 * n / sum(n, na.rm = T))

# A tibble: 5 x 4
# Groups:   coin [3]
  coin  winner     n     p
  <chr> <chr>  <int> <dbl>
1 1,1   lose       4 100  
2 1,2   lose       2  14.3
3 1,2   tie        4  28.6
4 1,2   win        8  57.1
5 2,2   lose       2 100  

CodePudding user response:

Here is my approach. I am not sure if this is the intended way:

library(dplyr)

my_data %>% 
  group_by(Game) %>% 
  mutate(combinations = toString(coin)) %>% 
  distinct(combinations, .keep_all = TRUE) %>% 
  ungroup() %>% 
  group_by(combinations, winner) %>% 
  summarise(n = n()) %>% 
  mutate(freq = n/sum(n)) 
  combinations winner     n  freq
  <chr>        <chr>  <int> <dbl>
1 1, 1         lose       4 1    
2 1, 2         tie        2 0.333
3 1, 2         win        4 0.667
4 2, 1         lose       2 0.25 
5 2, 1         tie        2 0.25 
6 2, 1         win        4 0.5  
7 2, 2         lose       2 1    
  • Related