Home > OS >  looping over dataframes and looping over variables in a dataframe
looping over dataframes and looping over variables in a dataframe

Time:12-07

Stata user here, having trouble with loops in R. I won't trouble you with the "how I would do it in stata" caveat; my issue sis that using loops (or apply) I cannot figure out how to properly loop through data frames or variables within R.

Suppose I have these data frames:

date <- c("2021-03-21", "2022-03-20", "2020-03-15")
char_nums1_a <- c("1", "2", "5")
char_nums2_a <- c("2", "3", "5")
df_a <- data.frame(date, char_nums1_a, char_nums2_a)

date <- c("2021-04-21", "2022-05-20", "2020-04-15")
char_nums1_b <- c("3", "2", "3")
char_nums2_b <- c("1", "2", "3")
df_b <- data.frame(date, char_nums1_b, char_nums2_b)

My objective is two fold:

  1. I would like to be able to perform a function on similarly named variables within two different data frames by looping through the names of those data frames. (ie. convert the date variable to date format within each data frame)

My instinct is as follows:

dfs <- list("df_a", "df_b")
for(x in dfs){
   x$date <- as.Date(x$date)
   }

I also tried

for(x in 1:2){
   dfs[x]$date <- as.Date(dfs[x]$date)
   }
  1. I would like to perform the same function on multiple variables from a list of variables within a data frame. (ie. convert two char variables to numeric)

My instinct is as follows:

vars <- c("char_nums1_b", "char_nums2_b")
for (var in vars) {
   df_b$var <- as.numeric(df_b$var)
   }

again, not quite right

As you might imagine, this is a simplified example of what I am trying to do with a much larger dataset (I'm dealing with 3 data frames of 161 vars each, and much of the cleaning I need to do can be grouped across all three data frames, or sets of 30-40 variables that all require the same transformation). Consequently, certain hard-coded solutions will not work.

For bonus points, it'd be great to also know how to loop through data frames and variables at once. Suppose the "_a" and "_b" were removed from my char_nums fields. How would I write a loop that performs a function on a list of variables for each data frame in a list of data frames?

CodePudding user response:

If it would help to combine the data frames, you might consider something like this, which also uses across to apply a function to a range of columns whose names have specified characteristics.

library(dplyr)
combined <- bind_rows(df_a = df_a, df_b = df_b, .id = "src) %>%
  mutate(across(contains("date"), as.Date),
         across(starts_with("char_nums"), as.numeric))
  • Related