I have a dataframe looks like below:
Place Time1 Time2 Time3 Time4 Time5 Time6 Time7 Time8 Time9 ...
CA 0.2 0.3 0.1 0.4
MN 0.01 0.19 0.1 0.2 0.1 0.1 0.1 0.1 0.1
NY 0.15 0.05 0.2 0.1 0.1 0.2 0.2
There are many more time columns. For each row, the values of the time columns sum to 1. I want to rank the time values for each row from large to small and get the time points when the sum of these time points first pass 0.25. For example, below is a sample output (NY has three values because none of them passes 0.25 but they are tie):
CA Time 4
MN Time 4 Time 2
NY Time 3 Time 6 Time 7
CodePudding user response:
You may get the data in long format, arrange
the rows by Place
and decreasing value
. For each Place
select the top values who sum up to 0.25. In case of tie, this would select all the similar values.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -Place,
values_drop_na = TRUE) %>%
arrange(Place, desc(value)) %>%
group_by(Place) %>%
summarise(result = toString(name[value %in%
value[seq_len(match(TRUE, cumsum(value) > 0.25))]]))
# Place result
# <chr> <chr>
#1 CA Time4
#2 MN Time4, Time2
#3 NY Time3, Time6, Time7
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(Place = c("CA", "MN", "NY"), Time1 = c(0.2, 0.01,
0.15), Time2 = c(0.3, 0.19, 0.05), Time3 = c(0.1, 0.1, 0.2),
Time4 = c(0.4, 0.2, 0.1), Time5 = c(NA, 0.1, 0.1), Time6 = c(NA,
0.1, 0.2), Time7 = c(NA, 0.1, 0.2), Time8 = c(NA, 0.1, NA
), Time9 = c(NA, 0.1, NA)), class = "data.frame", row.names = c(NA, -3L))