i'm trying to use two for loops to
- import some csv files to dataframe and
- apply my self-made function to those imported dataframes.
i searched about assign and get function and the first step(importing csv to df) worked...
Lines <- c(101, 102, 103, 301, 311, 312, 514, 617, 706, 918)
for (i in 1:length(Lines)) {
assign("dfName", paste0(("df_Line"),Lines[i]))
assign("lineName", paste0(("LineNum_"),Lines[i],(".csv")))
dfName <- read.csv(lineName, encoding="UTF-8")
}
this code worked and brought me LineNum_101.csv as df_Line101 as i wanted, from 101 to 918.
so i kept on and tried to apply 'function_Merged(df)'(which one i made. it is at the end of this question page) and it was supposed to make 'df_Line101' to 'df_101_Merged', from 101 to 918. but what's the problem with this loop?
Lines <- c(101, 102, 103, 301, 311, 312, 514, 617, 706, 918)
for (i in 1:length(Lines)) {
assign("dfName", paste0(("df_Line"),Lines[i]))
assign("dfMerged", paste0(("df_"),Lines[i],("_Merged")))
dfMerged <- function_Merged(dfName)
}
it says "Error: $ operator is invalid for atomic vectors".
well i checked the variable itselves without the loop, and
assign("dfName", paste0(("df_Line"),Lines[1]))
assign("dfMerged", paste0(("df_"),Lines[1],("_Merged")))
print(dfName)
print(dfMerged)
gave me [1] "df_Line101" and [1]"df_101_Merged" each.
i tried to apply my function with these two assigned values. i've tried many many things and now i don't have any idea.
get(dfMerged) <- function_Merged(get(dfName))
gave me [cannot find the function "get<-"]
get(dfMerged) <- function_Merged(dfName)
game ve [$ opperator is invalid for atomic vectors]
dfMerged <- function_Merged(get(dfName))
has... yes. it made a whole new frame named dfMerged. the function_Merged has worked though.
what should I do to make this for-loop work? any help or suggestion will be very useful. thanks!
22/05/24 reproducible version(?idk)
first those are the libraries i used (i know that many of these will not actually be used in this code, but anyway)
library(ggplot2)
library(dplyr)
library(ggpmisc)
library(plotrix)
library(tidyverse)
library(lubridate)
library(ggrepel)
library(broom)
library(plotly)
library(reprex)
library(readxl)
library(zoo)
library(pracma)
i have about 100 CSV files in my working directory. file names are in the format of "LineNum_(number).csv"
list.files()
'0523_visualize.ipynb''LineNum_101.csv''LineNum_102.csv''LineNum_103.csv''LineNum_104.csv''LineNum_105.csv''LineNum_106.csv''LineNum_107.csv''LineNum_108.csv''LineNum_113.csv''LineNum_114.csv''LineNum_115.csv''LineNum_116.csv''LineNum_117.csv''LineNum_119.csv''LineNum_121.csv''LineNum_201.csv''LineNum_202.csv''LineNum_203.csv''LineNum_211.csv''LineNum_212.csv''LineNum_213.csv''LineNum_216.csv''LineNum_301.csv''LineNum_311.csv''LineNum_312.csv''LineNum_313.csv''LineNum_314.csv''LineNum_315.csv''LineNum_316.csv''LineNum_317.csv''LineNum_318.csv''LineNum_501.csv''LineNum_511.csv''LineNum_512.csv''LineNum_513.csv''LineNum_514.csv''LineNum_601.csv''LineNum_602.csv''LineNum_603.csv''LineNum_604.csv''LineNum_605.csv''LineNum_606.csv''LineNum_607.csv''LineNum_608.csv''LineNum_611.csv''LineNum_612.csv''LineNum_613.csv''LineNum_614.csv''LineNum_615.csv''LineNum_616.csv''LineNum_617.csv''LineNum_618.csv''LineNum_619.csv''LineNum_620.csv''LineNum_622.csv''LineNum_701.csv''LineNum_703.csv''LineNum_704.csv''LineNum_705.csv''LineNum_706.csv''LineNum_711.csv''LineNum_712.csv''LineNum_802.csv''LineNum_911.csv''LineNum_912.csv''LineNum_916.csv''LineNum_918.csv'
each file looks like this:
df_Line311 <-read.csv("LineNum_311.csv", encoding = "UTF-8")
head(df_Line311, 5)
A data.frame: 5 × 5
Date On Off Transfer LineNum
<chr> <int> <int> <int> <int>
1 2020-01-02 15623 12250 3288 311
2 2020-01-03 16598 13078 3410 311
3 2020-01-04 12081 9771 2296 311
4 2020-01-05 9543 7556 1835 311
5 2020-01-06 14779 11607 3321 311
df_Line101 <-read.csv("LineNum_101.csv", encoding = "UTF-8")
head(df_Line101,5)
A data.frame: 5 × 5
Date On Off Transfer LineNum
<chr> <int> <int> <int> <int>
1 2020-01-02 4250 3725 1061 101
2 2020-01-03 4463 3910 1099 101
3 2020-01-04 3214 2847 753 101
4 2020-01-05 2977 2562 660 101
5 2020-01-06 4197 3673 1041 101
... and so on.
here On/Off/Transfer variables are the number of people those got on/off/transfered to the bus line LineNum. for example on 20-01-02, 15623 people got on the bus line 311.
now i'm working on the data with three steps:
- get only workdays(tue, wed, thu) data
function_Workdays <- function(dataframe) {
tempDF <- dataframe
tempDF$Date <- as.Date(tempDF$Date)
tempDF$Days <- weekdays(tempDF$Date)
tempDF$Workdays <- factor(tempDF$Days %in% c("화요일", "수요일", "목요일") )
# 화요일, 수요일, 목요일 means Tue, Wed, Thu each in Korean
tempDF <- subset(tempDF, Workdays==T)
return(tempDF)
rm(tempDF)
}
df_Line311_Workdays <- function_Workdays(df_Line311)
head(df_Line311_Workdays, 5)
A data.frame: 5 × 7
Date On Off Transfer LineNum Days Workdays
<date> <int> <int> <int> <int> <chr> <fct>
1 2020-01-02 15623 12250 3288 311 목요일 TRUE
6 2020-01-07 14779 11510 3125 311 화요일 TRUE
7 2020-01-08 15571 12315 3433 311 수요일 TRUE
8 2020-01-09 15828 12773 3383 311 목요일 TRUE
13 2020-01-14 15620 12721 3354 311 화요일 TRUE
- apply running media function(RunMed) in "stats" package to On values for smoothing
function_Runmed <- function(dataframe) {
tempDF <- dataframe
tempDF$On_RunMed <- runmed(tempDF$On, 7)
return(tempDF)
rm(tempDF)
}
df_Line311_Runmed <- function_Runmed(df_Line311_Workdays)
head(df_Line311_Runmed, 5)
Date On Off Transfer LineNum Days Workdays On_RunMed
<date> <int> <int> <int> <int> <chr> <fct> <dbl>
1 2020-01-02 15623 12250 3288 311 목요일 TRUE 15571
6 2020-01-07 14779 11510 3125 311 화요일 TRUE 15571
7 2020-01-08 15571 12315 3433 311 수요일 TRUE 15571
8 2020-01-09 15828 12773 3383 311 목요일 TRUE 15604
13 2020-01-14 15620 12721 3354 311 화요일 TRUE 15571
- apply Loess function also in the stats package
function_Loess <- function(dataframe) {
tempDF <- dataframe
tempDF$NumericDate = as.numeric(tempDF$Date)
LoessFunction <-
stats::loess(On_RunMed ~ NumericDate, data = tempDF, span = 0.1)
LoessFunction_value <- predict(LoessFunction, se=T)
Loess_Function_df <- data.frame(LoessFunction_value)
tempDF$Loess_Fit <- Loess_Function_df$fit
tempDF$Loess_SE <- Loess_Function_df$se.fit
return(tempDF)
rm(tempDF)
}
df_Line311_Runmed_Loess <- function_Loess(df_Line311_Runmed)
head(df_Line311_Runmed_Loess, 5)
A data.frame: 5 × 11
Date On Off Transfer LineNum Days Workdays On_RunMed NumericDate Loess_Fit Loess_SE
<date> <int> <int> <int> <int> <chr> <fct> <dbl> <dbl> <dbl> <dbl>
1 2020-01-02 15623 12250 3288 311 목요일 TRUE 15571 18263 15115.58 293.1331
6 2020-01-07 14779 11510 3125 311 화요일 TRUE 15571 18268 15437.50 210.3811
7 2020-01-08 15571 12315 3433 311 수요일 TRUE 15571 18269 15484.75 197.0860
8 2020-01-09 15828 12773 3383 311 목요일 TRUE 15604 18270 15526.54 184.9781
13 2020-01-14 15620 12721 3354 311 화요일 TRUE 15571 18275 15656.93 143.0892
and i merged those three...
function_Merged <- function(dataframe) {
df_Workdays <- function_Workdays(dataframe)
df_Runmed <- function_Runmed(df_Workdays)
df_Loess <- function_Loess(df_Runmed)
return(df_Loess)
}
df_311_Merged <- function_Merged(df_Line311)
head(df_311_Merged, 5)
A data.frame: 5 × 11
Date On Off Transfer LineNum Days Workdays On_RunMed NumericDate Loess_Fit Loess_SE
<date> <int> <int> <int> <int> <chr> <fct> <dbl> <dbl> <dbl> <dbl>
1 2020-01-02 15623 12250 3288 311 목요일 TRUE 15571 18263 15115.58 293.1331
6 2020-01-07 14779 11510 3125 311 화요일 TRUE 15571 18268 15437.50 210.3811
7 2020-01-08 15571 12315 3433 311 수요일 TRUE 15571 18269 15484.75 197.0860
8 2020-01-09 15828 12773 3383 311 목요일 TRUE 15604 18270 15526.54 184.9781
13 2020-01-14 15620 12721 3354 311 화요일 TRUE 15571 18275 15656.93 143.0892
for the final step i wanted all those csv files in working directory in dataframe and apply this merged function, in the same df name form (automated of course. this was the very reason that made me to try assign and get function in for loop)
thanks to the answers i got it is much easier with a dataframe list rather than those assign-get things. i successfully brougut all the csv files and apply my merged function to them.
my_list <- c("LineNum_101.csv", "LineNum_102.csv", "LineNum_103.csv")
my_df = lapply(my_list, function(x) read.csv(x, encoding = "UTF-8") )
lapply(my_df, function(x) function_Merged(x))
summary(my_df)
Length Class Mode
[1,] 5 data.frame list
[2,] 5 data.frame list
[3,] 5 data.frame list
my_df[1]
A data.frame: 786 × 5
Date On Off Transfer LineNum
<chr> <int> <int> <int> <int>
2020-01-02 4250 3725 1061 101
2020-01-03 4463 3910 1099 101
2020-01-04 3214 2847 753 101
2020-01-05 2977 2562 660 101
2020-01-06 4197 3673 1041 101
but i need those dataframes to get out from the list to have their own dataframe names after applying merged function. how can i do this? is there any way to export each dataframe in the lists with automated names? (I want my list elements to have name df_101_Merged, df_102_Merged, ... so on.) hmm... is there something i can to with LineNum data in each df?
CodePudding user response:
You haven't provided a reproducible example, but I'm going to make up some data and show how it's better to use lists than assign
and get
.
## each is just mtcars
my_files = c("mtcars.1.txt", "mtcars.2.txt", "mtcars.3.txt")
We can create a list
of data frames like this:
my_list = lapply(my_files, function(x) read.table(x, sep=","))
[[1]]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1 mpg cyl disp hp drat wt qsec vs am gear carb
2 21 6 160 110 3.9 2.62 16.46 0 1 4 4
3 21 6 160 110 3.9 2.875 17.02 0 1 4 4
4 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
5 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
6 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
7 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
8 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
9 24.4 4 146.7 62 3.69 3.19 20 1 0 4 2
[ reached 'max' / getOption("max.print") -- omitted 24 rows ]
[[2]]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1 mpg cyl disp hp drat wt qsec vs am gear carb
2 21 6 160 110 3.9 2.62 16.46 0 1 4 4
3 21 6 160 110 3.9 2.875 17.02 0 1 4 4
4 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
5 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
6 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
7 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
8 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
9 24.4 4 146.7 62 3.69 3.19 20 1 0 4 2
[ reached 'max' / getOption("max.print") -- omitted 24 rows ]
[[3]]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1 mpg cyl disp hp drat wt qsec vs am gear carb
2 21 6 160 110 3.9 2.62 16.46 0 1 4 4
3 21 6 160 110 3.9 2.875 17.02 0 1 4 4
4 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
5 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
6 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
7 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
8 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
9 24.4 4 146.7 62 3.69 3.19 20 1 0 4 2
[ reached 'max' / getOption("max.print") -- omitted 24 rows ]
Then suppose we want to apply a function to all data frames - we can define a function and use lapply
to apply it to all data frames in the list. In this case, let's just sample 2 rows from each data frame.
random_function = function(x, n) {
sample_n(x, n)
}
my_list = lapply(my_list, function(x) random_function(x, n=2))
[[1]]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1 32.4 4 78.7 66 4.08 2.2 19.47 1 1 4 1
2 15.5 8 318 150 2.76 3.52 16.87 0 0 3 2
[[2]]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1 10.4 8 472 205 2.93 5.25 17.98 0 0 3 4
2 13.3 8 350 245 3.73 3.84 15.41 0 0 3 4
[[3]]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
2 13.3 8 350 245 3.73 3.84 15.41 0 0 3 4
To name the resulting list, we can use names()
like:
## obviously change this based on how you want to name the list
library(stringr)
names(my_list) = str_remove_all(my_files, "\\.txt")
I hope you can adapt this to your data.