Home > Enterprise >  How do I used a for loop to apply the paste0 function onto many dataframes?
How do I used a for loop to apply the paste0 function onto many dataframes?

Time:07-01

I have the following dataframes:

hh02 <- c("exp_02", "m1_02", "m2_02")

I have tried to create a new variable called hhid in each item in hh02 by doing the following:

library(dplyr)

for(i in hh02){
  formula_hhid02 <- as.formula(paste0(i$tinh02, i$huyen02, i$xa02, i$diaban02, i$hoso02))
  i$hhid <- formula_hhid02
}

However I am getting an error message saying: Error: $ operator is invalid for atomic vectors

Is there a way I can create a new variable called hhid , preferably in dplyr language. Thank you.

CodePudding user response:

problem

While assuming exp_02, m1_02 and m2_02 are your dataframes, keep in mind that

hh02 <- c("exp_02", "m1_02", "m2_02")

does not actually put those dataframes together in a list; it just creates a vector with three strings in it. So you can't access the dataframes through these strings directly, as attempted in the question.

An obvious idea would be to put the actual dataframes in a list, but also this way we cannot modify them, because the results are stored in i, which is a copy of the original dataframe, not a reference to it:

# doesn't work:
hh02 <- list(exp_02, m1_02, m2_02)
for(i in hh02){
  i$newcolumn <- "hello world"
}
solution

We can use the functions get() and assign() to retrieve or assign the value of objects by string names, respectively:

# create some example data frames
ex1 <- mtcars
ex2 <- iris
ex3 <- PlantGrowth

# create a list of data frame names as string
hh02 <- c("ex1", "ex2", "ex3")

# loop over string names
for(i in hh02){

  # get a value from dataframe 'i':
  df <- get(i)
  
  # modify
  df$newcolumn <- "hello world"
  
  # store
  assign(i, df)
}

CodePudding user response:

When you are looping over hh02, each iteration of i is just the string "exp_02" or some other, corresponding to the iteration. You want to put you dataframes into a list, as such:

df_list <- list(
  exp_02 = exp_02,
  m1_02 = m1_02,
  m2_02 = m2_02
)

Then you can for loop the elements of the list, which would be each data.frame.

for(df in df_list){
  df$hhid <- paste0(df$tinh02, df$huyen02, df$xa02, df$diaban02, df$hoso02)
}

Now without any reproducible example, i cannot validate that this will overwrite/append the $hhid column to the data.frame in the list, but this should be enought to get you going i hope.

CodePudding user response:

library(tidyverse)
library(lubridate)

Example data

exp_02 <- tibble(data = rnorm(10),
                 A = rnorm (10))

m1_01 <- tibble(data = rnorm(10),
                A = rnorm (10))

m1_02 <- tibble(data = rnorm(10),
                A = rnorm (10))

# A tibble: 10 x 2
      data       A
     <dbl>   <dbl>
 1  0.187   1.78  
 2 -0.895   0.969 
 3  0.360  -0.0876
 4  0.489   0.410 
 5  1.84    2.32  
 6 -1.65    0.948 
 7  0.0524 -1.44  
 8  1.56    3.13  
 9  0.410   0.690 
10  1.36    1.22      

Put them into a list

hh02 <- list(exp_02,
             m1_01,
             m1_02)

Create a new column. Example take the column A * 2

imap(hh02, ~ bind_cols(.x, hhid = .x$A * 2))

[[1]]
# A tibble: 10 x 3
      data       A   hhid
     <dbl>   <dbl>  <dbl>
 1 -0.164   1.48    2.95 
 2  0.344  -1.33   -2.66 
 3 -1.99   -0.307  -0.614
 4  1.19   -0.458  -0.916
 5 -0.836  -0.0525 -0.105
 6  1.19   -1.22   -2.44 
 7  1.81   -0.349  -0.698
 8 -0.0463  0.444   0.888
 9 -0.261  -0.663  -1.33 
10 -0.0405  0.708   1.42 

[[2]]
# A tibble: 10 x 3
      data       A   hhid
     <dbl>   <dbl>  <dbl>
 1 -1.30    0.279   0.559
 2  0.0888 -0.969  -1.94 
 3  1.48    0.444   0.889
 4 -1.25    0.656   1.31 
 5  1.80   -0.512  -1.02 
 6 -0.121  -0.0720 -0.144
 7 -0.469  -1.64   -3.29 
 8  1.43   -1.17   -2.34 
 9  0.555  -0.393  -0.786
10 -0.139   0.241   0.483
  • Related