I'm looking to find the max values of different columns based on specified rows of each column.
My actual data frame is 50K columns and 1K rows so I can't use a loop without greatly increasing run time.
Data Frame:
row | V1 | V2 | V3 | V4 |
---|---|---|---|---|
1 | 5 | 2 | 4 | 5 |
2 | 3 | 5 | 1 | 6 |
3 | 7 | 3 | 2 | 6 |
4 | 2 | 5 | 3 | 10 |
5 | 6 | 9 | 1 | 2 |
beg_row <- c(2, 1, 2, 3)
end_row <- c(4, 3, 3, 5)
output:
c(7, 5, 2, 10)
CodePudding user response:
You can try mapply
(but I suspect that it won't speed up the runtime if you have massive columns)
> mapply(function(x, y, z) max(x[y:z]), df[-1], beg_row, end_row)
V1 V2 V3 V4
7 5 2 10
Data
df <- structure(list(row = 1:5, V1 = c(5L, 3L, 7L, 2L, 6L), V2 = c(
2L,
5L, 3L, 5L, 9L
), V3 = c(4L, 1L, 2L, 3L, 1L), V4 = c(
5L, 6L, 6L,
10L, 2L
)), class = "data.frame", row.names = c(NA, -5L))
beg_row <- c(2, 1, 2, 3)
end_row <- c(4, 3, 3, 5)
CodePudding user response:
An option with dplyr
library(dplyr)
df1 %>%
summarise(across(-row, ~ {
i1 <- match(cur_column(), names(df1)[-1])
max(.x[beg_row[i1]:end_row[i1]])}))
V1 V2 V3 V4
1 7 5 2 10
Or another option is to create NA
outside the range and then use colMaxs
library(matrixStats)
colMaxs(as.matrix((NA^!(row(df1[-1]) >= beg_row[col(df1[-1])] &
row(df1[-1]) <= end_row[col(df1[-1])])) * df1[-1]), na.rm = TRUE)
[1] 7 5 2 10
CodePudding user response:
Another possible solution, based on purrr::pmap_dbl
:
library(purrr)
pmap_dbl(list(beg_row, end_row, 2:ncol(df)), ~ max(df[..1:..2, ..3]))
#> [1] 7 5 2 10
CodePudding user response:
With data as in @ThomasIsCoding's answer. Same basic answer as the others, but putting beg_row
and end_row
in a structure somewhat matching the data it corresponds to.
library(purrr)
rbind(beg_row, end_row) %>%
as.data.frame %>%
map2_dbl(df[-1], ~ max(.y[exec(seq, !!!.x)]))
#> V1 V2 V3 V4
#> 7 5 2 10
I think maybe if your start and end are very far apart it may be faster to check if df$row
is between
them with data.table::between
rather than creating a new vector to subset with each time. Not sure though
library(purrr)
library(data.table)
rbind(beg_row, end_row) %>%
as.data.frame %>%
map2_dbl(df[-1], ~ max(.y[exec(between, df$row, !!!.x)]))