I have two lists:
n <- 'winner'
p_list <- c('qualified', 'female', 'apple')
df_features <- c('female','qualified','admission','apple_B','apple_C','apple_D')
I want to generate a formula like so given p_list and df_features:
winner ~ apple_B apple_C apple_D female qualified
Basically I am given p_list
and n
. I want to create a formula with n
being the outcome and p_list
being the regressors. However if one of the elements in p_list
is not in df_features
, I want to alter that element to be replaced by anything with the same text before the underscore (_) from df_features
. So apple would be replaced by apple_B apple_C apple_D. Hopefully this makes sense.
How can I do this in R (I prefer a solution if dplyr if possible).
I've tried this so far:
f <- as.formula(paste(n,"~",paste(p_list,collapse=" ")))
But right now the solution is not accounting for df_features
and the altering of the variable apple.
I'm also able to check if values in p_list
are in df_features
by p_list %in% df_features
, but not sure how to use it right now.
CodePudding user response:
grep
out from the df_features
those matching p_list
and use with reformulate
to produce the formula. No packages are used.
reformulate(unlist(sapply(p_list, grep, df_features, value = TRUE)), n)
## winner ~ qualified female apple_B apple_C apple_D
CodePudding user response:
The Answer by G. Grothendieck is so good, I almost feel shame of posting mine. However, I'll do, as I find that sometimes going the long way gives you additional knowledge of the tool at hand:
as.formula(paste0(n,
" ~ ",
paste(c(p_list[p_list %in% df_features == TRUE],
grep(p_list[p_list %in% df_features == FALSE],
df_features,
value=TRUE)),
collapse = " ")))
What is in there:
as.formula
converts strings to formula.paste0
will paste the string stored inn
, the tilde and the result ofpaste
.paste
will concatenate, using " " as collapser (collapse = " "
):- those elements of
p_list
that are indf_features
(henceTRUE
) - and it will grep on
df_features
those that are not a direct match (FALSE
), returning the values and not the indexes (value = TRUE
).