Home > database >  Applying a Function to a Data Frame : lapply vs traditional way
Applying a Function to a Data Frame : lapply vs traditional way

Time:03-29

I have this data frame in R:

x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)

I also have this function:

some_function <- function(x,y) { return(x y) }

Basically, I want to create a new column in the data frame based on "some_function". I thought I could do this with the "lapply" function in R:

data_frame$new_column <-lapply(c(data_frame$x, data_frame$y),some_function)

This does not work:

Error in `$<-.data.frame`(`*tmp*`, f, value = list()) : 
  replacement has 0 rows, data has 8281

I know how to do this in a more "clunky and traditional" way:

data_frame$new_column = x   y

But I would like to know how to do this using "lapply" - in the future, I will have much more complicated and longer functions that will be a pain to write out like I did above. Can someone show me how to do this using "lapply"?

Thank you!

CodePudding user response:

When working within a data.frame you could use apply instead of lapply:

x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(x,y) { return(x y) }

data_frame$new_column <- apply(data_frame, 1, \(x) some_function(x["Var1"], x["Var2"]))
head(data_frame)

To apply a function to rows set MAR = 1, to apply a function to columns set MAR = 2.

lapply, as the name suggests, is a list-apply. As a data.frame is a list of columns you can use it to compute over columns but within rectangular data, apply is often the easiest.

If some_function is written for that specific purpose, it can be written to accept a single row of the data.frame as in

x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)

some_function <- function(row) { return(row[1] row[2]) }

data_frame$yet_another <- apply(data_frame, 1, some_function)
head(data_frame)

Final comment: Often functions written for only a pair of values come out as perfectly vectorized. Probably the best way to call some_function is without any function of the apply-familiy as in

some_function <- function(x,y) { return(x   y) }
data_frame$last_one <- some_function(data_frame$Var1, data_frame$Var2)
  • Related