Home > database >  Applying a custom function on a particular column of multiple dataframes in R
Applying a custom function on a particular column of multiple dataframes in R

Time:02-12

I have multiple dataframes (df1, df2, df3, df4, df5), and each dataframe has the same two columns: date, which has dates (e.g. 2020-11-12) as characters and price column as numeric. For example, df1 looks like this: df1

Date Price
2020-11-12 29.75
2020-11-13 29.95
2020-11-14 30.72
2020-11-15 32.83
2020-11-16 33.14

I am trying to use lapply with a custom function that converts the character "date" column to the date class. However, lapply function doesn't give me a reformatted date column. My simple codes are as follows:

df.list <- list(df1, df2, df3, df4, df5)  # create a list of dataframes

# create a custom function to change the class of date column
date_con <- function(x) {
             x$date <- as.Date(x$date, format="%Y-%m-%d")}

When I use

lapply(df.list, date_con)

the date columns still remain as character. For instance, when I check class(df$date), it still shows as "character" but not date. On the other hand, if I go manually for each dataframe to make this conversion, it works but I don't want to go over 100s of dataframes manually. That is,

df1$date <- as.Date(df1$date, format="%Y-%m-%d") 

works but it is obviously not efficient and I am sure there is a way to achieve this. So, how can I use lapply or some other approach to efficiently convert character date column into a date class column for large number of dataframes?

CodePudding user response:

The issue here is that R passes argyments by value and not by reference/pointers. Thus the original objects are not modified. To modify the original objects, you need to invoke the use of names and listing to the environment.

Name the elements in your list:

df.list <- list(df1 = df1, df2 = df2, df3 = df3, df4 = df4, df5 = df5)

# function
date_con <- function(x) {
    x$date <- as.Date(x$date, format="%Y-%m-%d")
    x
}

Now run

list2env(lapply(df.list, date_con),.GlobalEnv)

You can now check the class of the date column in your original dataframe:

class(df1$date)
[1] "Date"

CodePudding user response:

You're missing to return x.

df_list <- lapply(df_list, \(x) {x$date <- as.Date(x$date);x})
str(df_list)
# List of 3
# $ :'data.frame':  6 obs. of  2 variables:
#   ..$ date : Date[1:6], format: "2020-11-12" ...
# ..$ price: num [1:6] 29.8 29.9 30 30.4 30.2 ...
# $ :'data.frame':  6 obs. of  2 variables:
#   ..$ date : Date[1:6], format: "2020-11-12" ...
# ..$ price: num [1:6] 29.8 29.9 30 30.4 30.2 ...
# $ :'data.frame':  6 obs. of  2 variables:
#   ..$ date : Date[1:6], format: "2020-11-12" ...
# ..$ price: num [1:6] 29.8 29.9 30 30.4 30.2 ...

Data:

df_list <- list(structure(list(date = c("2020-11-12", "2020-11-13", "2020-11-14", 
"2020-11-15", "2020-11-16", "2020-11-17"), price = c(29.75, 29.94, 
29.97, 30.37, 30.23, 30.22)), class = "data.frame", row.names = c(NA, 
-6L)), structure(list(date = c("2020-11-12", "2020-11-13", "2020-11-14", 
"2020-11-15", "2020-11-16", "2020-11-17"), price = c(29.75, 29.94, 
29.97, 30.37, 30.23, 30.22)), class = "data.frame", row.names = c(NA, 
-6L)), structure(list(date = c("2020-11-12", "2020-11-13", "2020-11-14", 
"2020-11-15", "2020-11-16", "2020-11-17"), price = c(29.75, 29.94, 
29.97, 30.37, 30.23, 30.22)), class = "data.frame", row.names = c(NA, 
-6L)))
  • Related