I have 3 dataframes with the same dimensions. I want to create a dataframe with min value from each element in 3 dataframes. Is there a more efficient way than running loop on cloumn and then row going through each element one by one?
Dataframe X
| Column A | Column B |
| -------- | -------- |
| Cell X1 | Cell X2 |
| Cell X3 | Cell X4 |
Dataframe Y
| Column A | Column B |
| -------- | -------- |
| Cell Y1 | Cell Y2 |
| Cell Y3 | Cell Y4 |
Dataframe Z
| Column A | Column B |
| -------- | -------- |
| Cell Z1 | Cell Z2 |
| Cell Z3 | Cell Z4 |
Dataframe Target Output
| Column A | Column B |
| -------- | -------- |
| Min (Cell X1,Y1,Z1) | Min (Cell X2,Y2,Z2) |
| Min (Cell X3,Y3,Z3) | Min (Cell X4,Y4,Z4) |
Thank you!
I tried simple loop for each column and then each row for (c in 1:3){ for (r in 1:2){ ........ } }
CodePudding user response:
You can use the pmin() function from the base R package to get the minimum value of each row across three dataframes.
For example:
df_target = data.frame(c1 = pmin(X$ColumnA, Y$ColumnA, Z$ColumnA),
c2 = pmin(X$ColumnB, Y$ColumnB, Z$ColumnB))
Or you can use the apply() function to apply the pmin() function to each row of the dataframes:
df_target = data.frame(apply(cbind(X$ColumnA, Y$ColumnA, Z$ColumnA), 1, pmin),
apply(cbind(X$ColumnB, Y$ColumnB, Z$ColumnB), 1, pmin))
CodePudding user response:
You could loop over the columns and use pmin
:
as.data.frame(lapply(1:ncol(X), \(i) pmin(X[[i]], Y[[i]], Z[[i]])))
If you need to generalize to more data frames, I would put them in a list
and use do.call
with the pmin
.
(Answer is untested as no sample data is provided.)
The more general solution would be to stack the data frames into a 3-d array and use apply
:
arr = abind(A, B, C, along = 3)
result = apply(arr, 1:2, min)
CodePudding user response:
Another option is: Tidyverse:
library(purrr)
map_dfc(transpose(df_list), lift(pmin))
data.table:
library(data.table)
rbindlist(df_list, idcol = "group")[,lapply(.SD,min),rowid(group)]
Base R
aggregate(do.call(rbind,df_list), list(sequence(sapply(df_list, nrow))), min)
EDIT:
Base R approach:
Reduce(function(x,y)replace(x, x>y , y[x>y]), list_df)