Home > Enterprise >  R : How to use 'for' loop to create variables using dplyr's mutate?
R : How to use 'for' loop to create variables using dplyr's mutate?

Time:03-01

I want to use dplyr's 'mutate' to create variables in a loop. I have 4 variables a) yield_corn_total, b) yield_soybeans_total, c) yield_wheat_total, d) yield_sorghum_total. I want to create 4 other variables which are log of these 4 existing variables and they should be named a) log_yield_corn_total, b) log_yield_soybeans_total, c) log_yield_wheat_total, d) log_yield_sorghum_total

When I run the following code :

 crops <- c( "corn", "soybeans", "wheat", "sorghum")
    data <- data %>% 
      for (i in crops){ 
      mutate(sym(paste0("log_yield_",i,"_total")) := log(paste0("yield_",i,"_total")))
    }

I get the following error :

Error in for (. in i) crops : 
  4 arguments passed to 'for' which requires 3

CodePudding user response:

Don't use for loops, use across(). This is untested as you haven't provided sample data, but it should work--if not, please provide some sample data for debugging, e.g. dput(data[1:4, ])

crops <- c( "corn", "soybeans", "wheat", "sorghum")
cols = paste("yield", crops, "total", sep = "_")
data %>%
  mutate(across(cols, log, .names = "log_{.col}"))

CodePudding user response:

Since you're already into the tidyverse {dplyr}, you could also leverage one of the "tidy" principles: one column per variable. This means reshaping your data table from wide (one column per crop type x yield) to long (one column for crop type, one for yield). This makes a lot of manipulations easier. Only after all calculations are done, the table might be reverted to wide format for presentational purposes.

Example:

df <- data.frame(
    farm = 'Bonanza',
    yield_corn_total = 34,
    yield_soybeans_total = 87,
    yield_wheat_total = 34,
    yield_sorghum_total = 12
)

df %>%
    ## from wide to long:
    pivot_longer(
        cols = starts_with('yield'),
        names_to = 'crop', 
        values_to = 'total_yield'
    ) %>%
    ## do some manipulations
    mutate(crop = crop %>%
               ## strip all except the actual crop name:
               gsub('yield_','',.) %>% gsub('_total','',.),
           log_yield = log(total_yield)
           ) %>%
    ## further manipulations like filtering, summarising
    ## e.g. filter (log_yield > 2, farm == 'Ponderosa')
    ## ... 
    ## if need be, make it a 'wide' table again:
    pivot_wider(
        values_from = ends_with('yield'),
        names_from = crop
    )        

  • Related