I have the following data set:
PATH = c("5-8-10-8-17-20",
"56-85-89-89-0-15-88-10",
"58-85-89-65-49-51")
INDX = c(18, 89, 50)
data.frame(PATH, INDX)
PATH | INDX |
---|---|
5-8-10-8-17-20 | 18 |
56-85-89-89-0-15-88-10 | 89 |
58-85-89-65-49-51 | 50 |
The column PATH has strings that represent a numerical series and I want to be able to pick the largest number from the string that satisfies PATH <= INDX
, that is selecting a number from PATH that is equal to INDX
or the largest number from PATH
that is yet less than INDX
my desired output would look like this:
PATH | INDX | PICK |
---|---|---|
5-8-10-8-17-20 | 18 | 17 |
56-85-89-89-0-15-88-10 | 89 | 88 |
58-85-89-65-49-51 | 50 | 49 |
Some of my thought-process behind the answer:
I know that If I have a function such strsplit
I could separate each string by "-"
, arrange by number and then subtract with INDX
and thus select the smallest negative number or zero. However, the original dataset is quite large and I wonder if there is a faster or more efficient way to perform this task.
CodePudding user response:
Another option:
mapply(
\(x, y) max(x[x <= y]),
strsplit(PATH, "-") |> lapply(as.integer),
INDX
)
# [1] 17 88 49
CodePudding user response:
Using purrr::map2_dbl()
:
library(purrr)
PICK <- map2_dbl(
strsplit(PATH, "-"),
INDX,
~ max(
as.numeric(.x)[as.numeric(.x) <= .y]
)
)
# 17 89 49
CodePudding user response:
The below should be reasonably efficient, there is nothing wrong with your approach.
numpath <- sapply(strsplit(PATH, "-"), as.numeric)
maxindexes <- lapply(1:length(numpath), function(x) which(numpath[[x]] <= INDX[x]))
result <- sapply(1:length(numpath), function(x) max(numpath[[x]][maxindexes[[x]]]))
> result
[1] 17 89 49
CodePudding user response:
Using dplyr
library(dplyr)
df |>
rowwise() |>
mutate(across(PATH, ~ {
a = unlist(strsplit(.x, split = "-"))
max(as.numeric(a)[which(as.numeric(a) <= INDX)])
}, .names = "PICK"))
PATH INDX PICK
<chr> <dbl> <dbl>
1 5-8-10-8-17-20 18 17
2 56-85-89-89-0-15-88-10 89 89
3 58-85-89-65-49-51 50 49
CodePudding user response:
You can create a custom function like below:
my_func <- function(vec1, vec2) {
sort(as.numeric(unlist(strsplit(vec1, split = "-")))) -> x
return(x[max(cumsum(x <= vec2))])
}
df$PICK <- sapply(seq_len(nrow(df)), function(i) my_func(df$PATH[i], df$INDX[i]))
which will yield the following output:
# PATH INDX PICK
# 1 5-8-10-8-17-20 18 17
# 2 56-85-89-89-0-15-88-10 89 89
# 3 58-85-89-65-49-51 50 49