Home > OS >  Filter values based on other values in r
Filter values based on other values in r

Time:09-02

I have a data frame consisting of several sets that I load into r. I need the entire data frame and then one filtered based on the sets. Currently I bind the sets once to the unfiltered dataframe, then filter the sets and bind them again to the filtered dataframe.

Is there a way to filter the first data frame based on the sets so that I can skip this tedious step?

There are 10 sets, each containing more than 72 hours of time recording every 5 minutes, which start at different times. So I need a solution that does not select the data based on their row number but on the content (here set).

Example:

example<-data.frame(set = c("set1","set1","set1","set1","set1","set1","set1",
                            "set2","set2","set2","set2","set2","set2","set2"),
                    time = c("2022-01-01 05:55", "2022-01-01 06:00", "2022-01-01 06:05", "2022-01-01 06:10", "2022-01-01 06:15", "2022-01-01 06:20", "2022-01-01 06:25",
                             "2022-01-02 05:58", "2022-01-02 06:03", "2022-01-02 06:08", "2022-01-02 06:13", "2022-01-02 06:18", "2022-01-02 06:23", "2022-01-02 06:28")
                    )

I want the time 2022-01-01 06:00 to 2022-01-01 06:15 for set1 and the time 22-01-01 05:58 to 2022-01-02 06:13 for set2.

CodePudding user response:

You could create specific filters for each set like this. With dplyr, you can take advantage of functions like between, first, last, or nth to make it easier.

library(dplyr)
example %>% 
  mutate(time = as.POSIXct(time)) %>% 
  filter((set == "set1" & between(time,
                                as.POSIXct("2022-01-01 06:00"),
                                as.POSIXct("2022-01-01 06:15"))) |
         (set == "set2" & between(time,
                                first(time),
                                as.POSIXct("2022-01-02 06:13"))))
   set                time
1 set1 2022-01-01 06:00:00
2 set1 2022-01-01 06:05:00
3 set1 2022-01-01 06:10:00
4 set1 2022-01-01 06:15:00
5 set2 2022-01-02 05:58:00
6 set2 2022-01-02 06:03:00
7 set2 2022-01-02 06:08:00
8 set2 2022-01-02 06:13:00

CodePudding user response:

If you need to deal with different times for each sets and there are only 10 cases - I would write as an if else statement.

require(data.table)
setDT(example)
res <- rbindlist(lapply(unique(example[, set]), function(i) {
    if (i == "set1") {
        example[time > "2022-01-01 06:00" & time < "2022-01-01 06:15", ]
    } else if (i == "set2") {
        example[time > "2022-01-01 05:58" & time < "2022-01-02 06:13", ]
    # else if (i == "setX") ...
    } else {
        example
    }
}),use.names = TRUE, fill = TRUE)
  • Related