Home > Mobile >  R assign/get variables as df names in two for-loops, one works and one doesn't
R assign/get variables as df names in two for-loops, one works and one doesn't

Time:05-25

i'm trying to use two for loops to

  1. import some csv files to dataframe and
  2. apply my self-made function to those imported dataframes.

i searched about assign and get function and the first step(importing csv to df) worked...

Lines <- c(101, 102, 103, 301, 311, 312, 514, 617, 706, 918)

for (i in 1:length(Lines)) {
assign("dfName", paste0(("df_Line"),Lines[i]))
assign("lineName", paste0(("LineNum_"),Lines[i],(".csv")))
dfName <- read.csv(lineName, encoding="UTF-8")
}

this code worked and brought me LineNum_101.csv as df_Line101 as i wanted, from 101 to 918.

so i kept on and tried to apply 'function_Merged(df)'(which one i made. it is at the end of this question page) and it was supposed to make 'df_Line101' to 'df_101_Merged', from 101 to 918. but what's the problem with this loop?

Lines <- c(101, 102, 103, 301, 311, 312, 514, 617, 706, 918)

for (i in 1:length(Lines)) {
assign("dfName", paste0(("df_Line"),Lines[i]))
assign("dfMerged", paste0(("df_"),Lines[i],("_Merged")))
dfMerged <- function_Merged(dfName)
}

it says "Error: $ operator is invalid for atomic vectors".

well i checked the variable itselves without the loop, and

assign("dfName", paste0(("df_Line"),Lines[1]))
assign("dfMerged", paste0(("df_"),Lines[1],("_Merged")))
print(dfName)
print(dfMerged)

gave me [1] "df_Line101" and [1]"df_101_Merged" each.

i tried to apply my function with these two assigned values. i've tried many many things and now i don't have any idea.

get(dfMerged) <- function_Merged(get(dfName))

gave me [cannot find the function "get<-"]

get(dfMerged) <- function_Merged(dfName)

game ve [$ opperator is invalid for atomic vectors]

dfMerged <- function_Merged(get(dfName))

has... yes. it made a whole new frame named dfMerged. the function_Merged has worked though.

what should I do to make this for-loop work? any help or suggestion will be very useful. thanks!


22/05/24 reproducible version(?idk)

first those are the libraries i used (i know that many of these will not actually be used in this code, but anyway)

library(ggplot2)
library(dplyr)
library(ggpmisc)
library(plotrix)
library(tidyverse)
library(lubridate)
library(ggrepel)
library(broom)
library(plotly)
library(reprex)
library(readxl)
library(zoo)
library(pracma)

i have about 100 CSV files in my working directory. file names are in the format of "LineNum_(number).csv"

list.files()
'0523_visualize.ipynb''LineNum_101.csv''LineNum_102.csv''LineNum_103.csv''LineNum_104.csv''LineNum_105.csv''LineNum_106.csv''LineNum_107.csv''LineNum_108.csv''LineNum_113.csv''LineNum_114.csv''LineNum_115.csv''LineNum_116.csv''LineNum_117.csv''LineNum_119.csv''LineNum_121.csv''LineNum_201.csv''LineNum_202.csv''LineNum_203.csv''LineNum_211.csv''LineNum_212.csv''LineNum_213.csv''LineNum_216.csv''LineNum_301.csv''LineNum_311.csv''LineNum_312.csv''LineNum_313.csv''LineNum_314.csv''LineNum_315.csv''LineNum_316.csv''LineNum_317.csv''LineNum_318.csv''LineNum_501.csv''LineNum_511.csv''LineNum_512.csv''LineNum_513.csv''LineNum_514.csv''LineNum_601.csv''LineNum_602.csv''LineNum_603.csv''LineNum_604.csv''LineNum_605.csv''LineNum_606.csv''LineNum_607.csv''LineNum_608.csv''LineNum_611.csv''LineNum_612.csv''LineNum_613.csv''LineNum_614.csv''LineNum_615.csv''LineNum_616.csv''LineNum_617.csv''LineNum_618.csv''LineNum_619.csv''LineNum_620.csv''LineNum_622.csv''LineNum_701.csv''LineNum_703.csv''LineNum_704.csv''LineNum_705.csv''LineNum_706.csv''LineNum_711.csv''LineNum_712.csv''LineNum_802.csv''LineNum_911.csv''LineNum_912.csv''LineNum_916.csv''LineNum_918.csv'

each file looks like this:

df_Line311 <-read.csv("LineNum_311.csv", encoding = "UTF-8")
head(df_Line311, 5)

A data.frame: 5 × 5
Date    On  Off Transfer    LineNum
<chr>   <int>   <int>   <int>   <int>
1   2020-01-02  15623   12250   3288    311
2   2020-01-03  16598   13078   3410    311
3   2020-01-04  12081   9771    2296    311
4   2020-01-05  9543    7556    1835    311
5   2020-01-06  14779   11607   3321    311
df_Line101 <-read.csv("LineNum_101.csv", encoding = "UTF-8")
head(df_Line101,5)
A data.frame: 5 × 5
Date    On  Off Transfer    LineNum
<chr>   <int>   <int>   <int>   <int>
1   2020-01-02  4250    3725    1061    101
2   2020-01-03  4463    3910    1099    101
3   2020-01-04  3214    2847    753 101
4   2020-01-05  2977    2562    660 101
5   2020-01-06  4197    3673    1041    101

... and so on.
here On/Off/Transfer variables are the number of people those got on/off/transfered to the bus line LineNum. for example on 20-01-02, 15623 people got on the bus line 311.

now i'm working on the data with three steps:

  1. get only workdays(tue, wed, thu) data
function_Workdays <- function(dataframe) {
    tempDF <- dataframe
    tempDF$Date <- as.Date(tempDF$Date)
    tempDF$Days <- weekdays(tempDF$Date)
    tempDF$Workdays <- factor(tempDF$Days %in% c("화요일", "수요일", "목요일") )
# 화요일, 수요일, 목요일 means Tue, Wed, Thu each in Korean
    tempDF <- subset(tempDF, Workdays==T)
    return(tempDF)
    rm(tempDF)
}

df_Line311_Workdays <- function_Workdays(df_Line311)
head(df_Line311_Workdays, 5)
A data.frame: 5 × 7
Date    On  Off Transfer    LineNum Days    Workdays
<date>  <int>   <int>   <int>   <int>   <chr>   <fct>
1   2020-01-02  15623   12250   3288    311 목요일 TRUE
6   2020-01-07  14779   11510   3125    311 화요일 TRUE
7   2020-01-08  15571   12315   3433    311 수요일 TRUE
8   2020-01-09  15828   12773   3383    311 목요일 TRUE
13  2020-01-14  15620   12721   3354    311 화요일 TRUE

  1. apply running media function(RunMed) in "stats" package to On values for smoothing
function_Runmed <- function(dataframe) {
    tempDF <- dataframe
    tempDF$On_RunMed <- runmed(tempDF$On, 7)
    return(tempDF)
    rm(tempDF)
}

df_Line311_Runmed <- function_Runmed(df_Line311_Workdays)
head(df_Line311_Runmed, 5)
    Date    On  Off Transfer    LineNum Days    Workdays    On_RunMed
<date>  <int>   <int>   <int>   <int>   <chr>   <fct>   <dbl>
1   2020-01-02  15623   12250   3288    311 목요일 TRUE    15571
6   2020-01-07  14779   11510   3125    311 화요일 TRUE    15571
7   2020-01-08  15571   12315   3433    311 수요일 TRUE    15571
8   2020-01-09  15828   12773   3383    311 목요일 TRUE    15604
13  2020-01-14  15620   12721   3354    311 화요일 TRUE    15571

  1. apply Loess function also in the stats package
function_Loess <- function(dataframe) {
    tempDF <- dataframe
    tempDF$NumericDate = as.numeric(tempDF$Date)
    LoessFunction <- 
    stats::loess(On_RunMed ~ NumericDate, data = tempDF, span = 0.1)
    LoessFunction_value <- predict(LoessFunction, se=T)
    Loess_Function_df <- data.frame(LoessFunction_value)
    tempDF$Loess_Fit <- Loess_Function_df$fit
    tempDF$Loess_SE <- Loess_Function_df$se.fit
    return(tempDF)
    rm(tempDF)
}

df_Line311_Runmed_Loess <- function_Loess(df_Line311_Runmed)
head(df_Line311_Runmed_Loess, 5)
A data.frame: 5 × 11
Date    On  Off Transfer    LineNum Days    Workdays    On_RunMed   NumericDate Loess_Fit   Loess_SE
<date>  <int>   <int>   <int>   <int>   <chr>   <fct>   <dbl>   <dbl>   <dbl>   <dbl>
1   2020-01-02  15623   12250   3288    311 목요일 TRUE    15571   18263   15115.58    293.1331
6   2020-01-07  14779   11510   3125    311 화요일 TRUE    15571   18268   15437.50    210.3811
7   2020-01-08  15571   12315   3433    311 수요일 TRUE    15571   18269   15484.75    197.0860
8   2020-01-09  15828   12773   3383    311 목요일 TRUE    15604   18270   15526.54    184.9781
13  2020-01-14  15620   12721   3354    311 화요일 TRUE    15571   18275   15656.93    143.0892

and i merged those three...

function_Merged <- function(dataframe) {  
    df_Workdays <- function_Workdays(dataframe)
    df_Runmed <- function_Runmed(df_Workdays)
    df_Loess <- function_Loess(df_Runmed)
    return(df_Loess)
}

df_311_Merged <- function_Merged(df_Line311)
head(df_311_Merged, 5)
A data.frame: 5 × 11
Date    On  Off Transfer    LineNum Days    Workdays    On_RunMed   NumericDate Loess_Fit   Loess_SE
<date>  <int>   <int>   <int>   <int>   <chr>   <fct>   <dbl>   <dbl>   <dbl>   <dbl>
1   2020-01-02  15623   12250   3288    311 목요일 TRUE    15571   18263   15115.58    293.1331
6   2020-01-07  14779   11510   3125    311 화요일 TRUE    15571   18268   15437.50    210.3811
7   2020-01-08  15571   12315   3433    311 수요일 TRUE    15571   18269   15484.75    197.0860
8   2020-01-09  15828   12773   3383    311 목요일 TRUE    15604   18270   15526.54    184.9781
13  2020-01-14  15620   12721   3354    311 화요일 TRUE    15571   18275   15656.93    143.0892

for the final step i wanted all those csv files in working directory in dataframe and apply this merged function, in the same df name form (automated of course. this was the very reason that made me to try assign and get function in for loop)

thanks to the answers i got it is much easier with a dataframe list rather than those assign-get things. i successfully brougut all the csv files and apply my merged function to them.

my_list <- c("LineNum_101.csv", "LineNum_102.csv", "LineNum_103.csv")
my_df = lapply(my_list, function(x) read.csv(x, encoding = "UTF-8") )
lapply(my_df, function(x) function_Merged(x))

summary(my_df)
     Length Class      Mode
[1,] 5      data.frame list
[2,] 5      data.frame list
[3,] 5      data.frame list

my_df[1]
A data.frame: 786 × 5
Date    On  Off Transfer    LineNum
<chr>   <int>   <int>   <int>   <int>
2020-01-02  4250    3725    1061    101
2020-01-03  4463    3910    1099    101
2020-01-04  3214    2847    753 101
2020-01-05  2977    2562    660 101
2020-01-06  4197    3673    1041    101

but i need those dataframes to get out from the list to have their own dataframe names after applying merged function. how can i do this? is there any way to export each dataframe in the lists with automated names? (I want my list elements to have name df_101_Merged, df_102_Merged, ... so on.) hmm... is there something i can to with LineNum data in each df?

CodePudding user response:

You haven't provided a reproducible example, but I'm going to make up some data and show how it's better to use lists than assign and get.

## each is just mtcars
my_files = c("mtcars.1.txt", "mtcars.2.txt", "mtcars.3.txt")

We can create a list of data frames like this:

my_list = lapply(my_files, function(x) read.table(x, sep=","))
[[1]]
    V1  V2    V3  V4   V5    V6    V7 V8 V9  V10  V11
1  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
2   21   6   160 110  3.9  2.62 16.46  0  1    4    4
3   21   6   160 110  3.9 2.875 17.02  0  1    4    4
4 22.8   4   108  93 3.85  2.32 18.61  1  1    4    1
5 21.4   6   258 110 3.08 3.215 19.44  1  0    3    1
6 18.7   8   360 175 3.15  3.44 17.02  0  0    3    2
7 18.1   6   225 105 2.76  3.46 20.22  1  0    3    1
8 14.3   8   360 245 3.21  3.57 15.84  0  0    3    4
9 24.4   4 146.7  62 3.69  3.19    20  1  0    4    2
 [ reached 'max' / getOption("max.print") -- omitted 24 rows ]

[[2]]
    V1  V2    V3  V4   V5    V6    V7 V8 V9  V10  V11
1  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
2   21   6   160 110  3.9  2.62 16.46  0  1    4    4
3   21   6   160 110  3.9 2.875 17.02  0  1    4    4
4 22.8   4   108  93 3.85  2.32 18.61  1  1    4    1
5 21.4   6   258 110 3.08 3.215 19.44  1  0    3    1
6 18.7   8   360 175 3.15  3.44 17.02  0  0    3    2
7 18.1   6   225 105 2.76  3.46 20.22  1  0    3    1
8 14.3   8   360 245 3.21  3.57 15.84  0  0    3    4
9 24.4   4 146.7  62 3.69  3.19    20  1  0    4    2
 [ reached 'max' / getOption("max.print") -- omitted 24 rows ]

[[3]]
    V1  V2    V3  V4   V5    V6    V7 V8 V9  V10  V11
1  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
2   21   6   160 110  3.9  2.62 16.46  0  1    4    4
3   21   6   160 110  3.9 2.875 17.02  0  1    4    4
4 22.8   4   108  93 3.85  2.32 18.61  1  1    4    1
5 21.4   6   258 110 3.08 3.215 19.44  1  0    3    1
6 18.7   8   360 175 3.15  3.44 17.02  0  0    3    2
7 18.1   6   225 105 2.76  3.46 20.22  1  0    3    1
8 14.3   8   360 245 3.21  3.57 15.84  0  0    3    4
9 24.4   4 146.7  62 3.69  3.19    20  1  0    4    2
 [ reached 'max' / getOption("max.print") -- omitted 24 rows ]

Then suppose we want to apply a function to all data frames - we can define a function and use lapply to apply it to all data frames in the list. In this case, let's just sample 2 rows from each data frame.

random_function = function(x, n) {
    sample_n(x, n)
}

my_list = lapply(my_list, function(x) random_function(x, n=2))
[[1]]
    V1 V2   V3  V4   V5   V6    V7 V8 V9 V10 V11
1 32.4  4 78.7  66 4.08  2.2 19.47  1  1   4   1
2 15.5  8  318 150 2.76 3.52 16.87  0  0   3   2

[[2]]
    V1 V2  V3  V4   V5   V6    V7 V8 V9 V10 V11
1 10.4  8 472 205 2.93 5.25 17.98  0  0   3   4
2 13.3  8 350 245 3.73 3.84 15.41  0  0   3   4

[[3]]
    V1 V2  V3  V4   V5   V6    V7 V8 V9 V10 V11
1 22.8  4 108  93 3.85 2.32 18.61  1  1   4   1
2 13.3  8 350 245 3.73 3.84 15.41  0  0   3   4

To name the resulting list, we can use names() like:

## obviously change this based on how you want to name the list
library(stringr)
names(my_list) = str_remove_all(my_files, "\\.txt")

I hope you can adapt this to your data.

  • Related