I want to use dplyr's 'mutate' to create variables in a loop. I have 4 variables a) yield_corn_total, b) yield_soybeans_total, c) yield_wheat_total, d) yield_sorghum_total. I want to create 4 other variables which are log of these 4 existing variables and they should be named a) log_yield_corn_total, b) log_yield_soybeans_total, c) log_yield_wheat_total, d) log_yield_sorghum_total
When I run the following code :
crops <- c( "corn", "soybeans", "wheat", "sorghum")
data <- data %>%
for (i in crops){
mutate(sym(paste0("log_yield_",i,"_total")) := log(paste0("yield_",i,"_total")))
}
I get the following error :
Error in for (. in i) crops :
4 arguments passed to 'for' which requires 3
CodePudding user response:
Don't use for
loops, use across()
. This is untested as you haven't provided sample data, but it should work--if not, please provide some sample data for debugging, e.g. dput(data[1:4, ])
crops <- c( "corn", "soybeans", "wheat", "sorghum")
cols = paste("yield", crops, "total", sep = "_")
data %>%
mutate(across(cols, log, .names = "log_{.col}"))
CodePudding user response:
Since you're already into the tidyverse {dplyr}
, you could also leverage one of the "tidy" principles: one column per variable. This means reshaping your data table from wide (one column per crop type x yield) to long (one column for crop type, one for yield). This makes a lot of manipulations easier. Only after all calculations are done, the table might be reverted to wide format for presentational purposes.
Example:
df <- data.frame(
farm = 'Bonanza',
yield_corn_total = 34,
yield_soybeans_total = 87,
yield_wheat_total = 34,
yield_sorghum_total = 12
)
df %>%
## from wide to long:
pivot_longer(
cols = starts_with('yield'),
names_to = 'crop',
values_to = 'total_yield'
) %>%
## do some manipulations
mutate(crop = crop %>%
## strip all except the actual crop name:
gsub('yield_','',.) %>% gsub('_total','',.),
log_yield = log(total_yield)
) %>%
## further manipulations like filtering, summarising
## e.g. filter (log_yield > 2, farm == 'Ponderosa')
## ...
## if need be, make it a 'wide' table again:
pivot_wider(
values_from = ends_with('yield'),
names_from = crop
)