Home > Net >  Outputting formula based on list of strings
Outputting formula based on list of strings

Time:09-21

I have two lists:

n <- 'winner'
p_list <- c('qualified', 'female', 'apple')
df_features <- c('female','qualified','admission','apple_B','apple_C','apple_D')


I want to generate a formula like so given p_list and df_features:

winner ~ apple_B   apple_C   apple_D   female   qualified

Basically I am given p_list and n. I want to create a formula with n being the outcome and p_list being the regressors. However if one of the elements in p_list is not in df_features, I want to alter that element to be replaced by anything with the same text before the underscore (_) from df_features. So apple would be replaced by apple_B apple_C apple_D. Hopefully this makes sense.

How can I do this in R (I prefer a solution if dplyr if possible).

I've tried this so far:

f <- as.formula(paste(n,"~",paste(p_list,collapse=" ")))

But right now the solution is not accounting for df_features and the altering of the variable apple.

I'm also able to check if values in p_list are in df_features by p_list %in% df_features, but not sure how to use it right now.

CodePudding user response:

grep out from the df_features those matching p_list and use with reformulate to produce the formula. No packages are used.

reformulate(unlist(sapply(p_list, grep, df_features, value = TRUE)), n)
## winner ~ qualified   female   apple_B   apple_C   apple_D

CodePudding user response:

The Answer by G. Grothendieck is so good, I almost feel shame of posting mine. However, I'll do, as I find that sometimes going the long way gives you additional knowledge of the tool at hand:

as.formula(paste0(n, 
                  " ~ ", 
                  paste(c(p_list[p_list %in% df_features == TRUE],
                        grep(p_list[p_list %in% df_features == FALSE],
                             df_features, 
                             value=TRUE)), 
                        collapse = " ")))

What is in there:

  • as.formula converts strings to formula.
  • paste0 will paste the string stored in n, the tilde and the result of paste.
  • paste will concatenate, using " " as collapser (collapse = " "):
  • those elements of p_list that are in df_features (hence TRUE)
  • and it will grep on df_features those that are not a direct match (FALSE), returning the values and not the indexes (value = TRUE).
  • Related