Home > database >  R Dataframe Filter Values
R Dataframe Filter Values

Time:05-15

I have a dataframe looks like below:

Place   Time1   Time2    Time3    Time4    Time5    Time6    Time7   Time8   Time9 ...
CA      0.2     0.3      0.1      0.4       
MN      0.01    0.19     0.1      0.2      0.1      0.1      0.1     0.1     0.1
NY      0.15    0.05     0.2      0.1      0.1      0.2      0.2

There are many more time columns. For each row, the values of the time columns sum to 1. I want to rank the time values for each row from large to small and get the time points when the sum of these time points first pass 0.25. For example, below is a sample output (NY has three values because none of them passes 0.25 but they are tie):

CA      Time 4    
MN      Time 4 Time 2
NY      Time 3 Time 6 Time 7

CodePudding user response:

You may get the data in long format, arrange the rows by Place and decreasing value. For each Place select the top values who sum up to 0.25. In case of tie, this would select all the similar values.

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = -Place, 
               values_drop_na = TRUE) %>%
  arrange(Place, desc(value)) %>%
  group_by(Place) %>%
  summarise(result = toString(name[value %in% 
                      value[seq_len(match(TRUE, cumsum(value) > 0.25))]]))

#  Place result             
#  <chr> <chr>              
#1 CA    Time4              
#2 MN    Time4, Time2       
#3 NY    Time3, Time6, Time7

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(Place = c("CA", "MN", "NY"), Time1 = c(0.2, 0.01, 
0.15), Time2 = c(0.3, 0.19, 0.05), Time3 = c(0.1, 0.1, 0.2), 
    Time4 = c(0.4, 0.2, 0.1), Time5 = c(NA, 0.1, 0.1), Time6 = c(NA, 
    0.1, 0.2), Time7 = c(NA, 0.1, 0.2), Time8 = c(NA, 0.1, NA
    ), Time9 = c(NA, 0.1, NA)), class = "data.frame", row.names = c(NA, -3L))
  • Related