Home > Net >  Creating a nested loop to execute a function on a list of variables from a list of data frames
Creating a nested loop to execute a function on a list of variables from a list of data frames

Time:12-07

I have three data frames, that could be stored as such

dfs <- list("ibu_819", "ibu_1121", "ibu_1022")

and a list of variables for which I need to complete a very simple operation: changing all the 2s to 0s (an incorrectly coded dummy variable)

vars <- list("bene_lastyear", "bene_nextyear", "child_death","citychild")

I have done so successfully using this clunky code

ibu_819 <- ibu_819 %>%
  mutate(bene_lastyear = if_else(bene_lastyear == 2, 0,1),
         bene_nextyear = if_else(bene_nextyear == 2, 0,1),
         child_death = if_else(child_death == 2, 0,1),
         citychild = if_else(citychild == 2, 0,1))

ibu_1121 <- ibu_1121 %>%
  mutate(bene_lastyear = if_else(bene_lastyear == 2, 0,1),
         bene_nextyear = if_else(bene_nextyear == 2, 0,1),
         child_death = if_else(child_death == 2, 0,1),
         citychild = if_else(citychild == 2, 0,1))

ibu_1022 <- ibu_1022 %>%
  mutate(bene_lastyear = if_else(bene_lastyear == 2, 0,1),
         bene_nextyear = if_else(bene_nextyear == 2, 0,1),
         child_death = if_else(child_death == 2, 0,1),
         citychild = if_else(citychild == 2, 0,1))

I have always performed my data cleaning in stata, where I would certainly want to take care of this task in one tidy loop, but I can't figure out how to do so in R. I'd love if someone could show me how to do exactly what I have done by looping over the two lists provided above, and only writing the actual mutate function once.

(also open to suggestions for a prettier solution than my if_else strategy. I'm sure there's a more fluid way to change my 2s to 0s, but I just did what I did because I knew how.)

ALSO, I should note that I do not want to append my data frames just yet, so please don't solve this by combining the data frames and then looping through the variables.

CodePudding user response:

Another option using Map

#create dummy data
l <- list(df1 <- data.frame(a=1:10),
df2 <- data.frame(b=1:10),
df3 <- data.frame(c=1:10)
)
var <- c("a","b","c")
#function to replace old values with new one
myfun <- function(df,var){
  df[df[[var]]==2,var] <- 0
  return(df)
}
res <- Map(myfun,l,var)

Here the original list of data.frame is preserved and all values =2 are update to 0 in the new list of data.frame, called res

CodePudding user response:

Keeping data frames names as a list of strings is a bit odd, having a list of the dataframes themselves would be better. That is:

dfs <- list(ibu_819, ibu_11211, ibu_1022)

Then you could use:

for(d in dfs){
  for(v in vars) d[[v]][d[[v]]==2] <- 0
}

Note only the copies inside the list would be updated.

If you want to do it using the list of dataframe names, (ie dfs is the list of strings you currently have) then I think you have to make a copy of the data frame inside the loop, then assign it back when you're done. This isn't good practice though.

for (d in dfs){
 df <- get(d)
 for(v in vars) df[[v]][df[[v]]==2] <- 0
 assign(d, df)
}

Finally, that pattern:

x[x==2] <- 0

Is how I would replace all the 2s with 0s in a vector. Does the same as replace x=2 if x==0 in Stata.

  • Related