Home > front end >  R: Playing "Guess Who" in R
R: Playing "Guess Who" in R

Time:12-21

I am working with the R programming language.

I am trying to create a game similar to the "Guess Who Game" (https://en.wikipedia.org/wiki/Guess_Who?) - a game in which players try to narrow down an in-game character based down a series of guesses.

Here is a dataset I simulated that contains the "counts" for athletes having different characteristics:

hair_color = factor(c("black", "brown", "blonde", "bald"))
glasses = factor(c("yes", "no", "contact lenses"))
sport = factor(c("football", "basketball", "tennis"))
gender = factor(c("male", "female", "other"))

problem = expand.grid(var1 = hair_color, var2 = glasses, var3 = sport, var4 = gender)
problem$counts = as.integer(rnorm(108, 20,5))
dataset = problem

    var1 var2     var3 var4 counts
1  black  yes football male     22
2  brown  yes football male     16
3 blonde  yes football male     12
4   bald  yes football male     22
5  black   no football male     14
6  brown   no football male     19

I then wrote a function that lets the user select rows from this dataset corresponding to a certain profile of characteristics:

   my_function <- function(dataset, var1 = NULL, var2 = NULL, var3 = NULL, var4 = NULL) {
    
    # Create a logical vector to store the rows that match the specified criteria
    selection <- rep(TRUE, nrow(dataset))
    
    # Filter rows based on the specified levels of var1
    if (!is.null(var1)) {
        selection <- selection & dataset$var1 %in% var1
    }
    
    # Filter rows based on the specified levels of var2
    if (!is.null(var2)) {
        selection <- selection & dataset$var2 %in% var2
    }
    
    # Filter rows based on the specified levels of var3
    if (!is.null(var3)) {
        selection <- selection & dataset$var3 %in% var3
    }
    
    # Filter rows based on the specified levels of var4
    if (!is.null(var4)) {
        selection <- selection & dataset$var4 %in% var4
    }
    
    # Select the rows that match the specified criteria
    selected_rows <- dataset[selection, ]
    
    # Return the selected rows
    return(selected_rows)
}

And now, to call the function - select all rows where : the hair is "BLACK OR BROWN" AND the glasses are YES:

head(my_function(dataset, var1 = c("black", "brown"), var2 = c("yes")))

    var1 var2       var3   var4 counts
1  black  yes   football   male     22
2  brown  yes   football   male     16
13 black  yes basketball   male     14
14 brown  yes basketball   male      9
25 black  yes     tennis   male     13

Another example, to call the function - select all rows where : the hair is "BLACK OR BROWN" AND the glasses are NO:

    head(my_function(dataset, var1 = c("black", "brown"), var2 = c("no")))

     var1 var2       var3   var4 counts
5   black   no   football   male     14
6   brown   no   football   male     19
17  black   no basketball   male     17
18  brown   no basketball   male     27

This leads me to my question - suppose I wanted to know the following: What is the (conditional) probability that an athlete wears glasses, given that they have black or brown hair?

Manually, I could answer the question like this:

a = my_function(dataset, var1 = c("black", "brown"), var2 = c("yes"))
b = my_function(dataset, var1 = c("black", "brown"), var2 = c("no"))
prob_yes = sum(a$counts) / (sum(a$counts)   sum(b$counts))
prob_no = sum(b$counts) / (sum(a$counts)   sum(b$counts))

> prob_yes
[1] 0.481203

> prob_no
[1] 0.518797

I was wondering if I could somehow extend this function to the general sense - suppose I wanted my function to take inputs as:

  • Which variables and which levels of these variables (e.g. - not all variables need to be selected)
  • Which variable (single variable) should the conditional probability be calculated on (e.g. "glasses")

And as an output:

  • All probabilities for all variables of this variable should be calculated

As an example - the desired function could be called like this:

my_function(dataset, input_var_list = c(var1 = c("black", "brown"), var3 = c("football")),  conditional_var = c("var2"))

And this desired function would return:

  • The probability of wearing glasses given that the athlete has black/brown hair and plays football
  • The probability of not wearing glasses given that the athlete has black/brown hair and plays football

Can someone please help me re-write this function?

Thanks!

CodePudding user response:

Not the most efficient approach, but I stayed on the path you went on... Is this what you are looking for?

my_function <- function(dataset, var1 = NULL, var2 = NULL, var3 = NULL, var4 = NULL,
                        conditional_var = NULL) {
  
  # Create a logical vector to store the rows that match the specified criteria
  selection <- rep(TRUE, nrow(dataset))
  
  # Filter rows based on the specified levels of var1
  if (!is.null(var1)) {
    selection <- selection & dataset$var1 %in% var1
  }
  
  # Filter rows based on the specified levels of var2
  if (!is.null(var2)) {
    selection <- selection & dataset$var2 %in% var2
  }
  
  # Filter rows based on the specified levels of var3
  if (!is.null(var3)) {
    selection <- selection & dataset$var3 %in% var3
  }
  
  # Filter rows based on the specified levels of var4
  if (!is.null(var4)) {
    selection <- selection & dataset$var4 %in% var4
  }
  
  # Select the rows that match the specified criteria
  selected_rows <- dataset[selection, ]
  
  # Return the selected rows
  if (is.null(conditional_var)) {
    return(selected_rows) 
  } else {
    return(prop.table(table(selected_rows[conditional_var])))
  }
}

my_function(dataset, var1 = c("black", "brown"), var3 = c("football"), conditional_var = c("var2"))

output

var2
contact lenses             no            yes 
     0.3333333      0.3333333      0.3333333
  • Related