Function that accepts factor and numerical inputs-CodePudding

I am working with the R programming language. I am trying to optimize a function that can accept numerical and factor inputs.

For the optimization, I use GA library.
My references: demo, actual Library, specific function I'm using

Suppose I have a function that looks like this:

my_function <- function(r1, r2) {
    #define function here, e.g:

    #this "select" can be done using "dplyr" or SQL part1 <- SELECT * FROM
    my_data WHERE (col_1 IN r1) AND (col_2 > r2)

    part2<- mean(part1$col_3)
}

In this example:

r1 can take any group of values of a, b, c, d (factor variable),
e.g. r1 = a, r1 = a,d, r1 = b,c,a, r1 = c, r1 = a,b,c,d etc.
r2 can take a single value between 1 and 100 (numeric variable)
my_data is a dataset that has 3 columns: col_1 (factor, can only take values a, b, c, d), col_2 (numeric), col_3 (numeric)
my_data will be "subsetted" according to r1 and r2
the mean of col_3 is the value that my_function will return given a choice of r1 and r2
the mean of col_3 will be the value that I am trying to optimize for a choice of r1 and r2

Problem: Currently, I am trying to optimize my_function using the ga function in R:

library(GA)
GA <- ga(type = "real-valued", 
         fitness = function(x)  my_function(x[1], x[2]),
         lower = c(c("a", "b", "c", "d"), 1), upper = c(c("a", "b", "c", "d"), 100), 
         popSize = 50, maxiter = 1000, run = 100)

But I am not sure how to set this up correctly.
I am not sure how to correctly define my_function and I am not sure how to correctly define GA.

CodePudding user response：

I think you are looking for something like this:

library("dplyr")

df <- data.frame(a = rep(letters[1:3], each=2),
                 b = rep(c(1,9), 3),
                 c = 1:6)
df
#>   a b c
#> 1 a 1 1
#> 2 a 9 2
#> 3 b 1 3
#> 4 b 9 4
#> 5 c 1 5
#> 6 c 9 6

my_subset_mean <- function(r1, r2){ ## Assumes an object `df` with cols a|b|c
  subset <- df %>% filter(a %in% r1, b > r2)
  return(mean(subset$c))
}

my_subset_mean(r1 = c("a"), r2 = 5) ## ~mean(2)
#> [1] 2
my_subset_mean(r1 = c("a", "b"), r2 = 0) ## ~mean(1:4)
#> [1] 2.5
my_subset_mean(r1 = c("a", "b"), r2 = 10) ## ~mean of df with 0 rows
#> [1] NaN

^{Created on 2021-09-25 by the reprex package (v2.0.0)}