I have a data frame consisting of several sets that I load into r. I need the entire data frame and then one filtered based on the sets. Currently I bind the sets once to the unfiltered dataframe, then filter the sets and bind them again to the filtered dataframe.
Is there a way to filter the first data frame based on the sets so that I can skip this tedious step?
There are 10 sets, each containing more than 72 hours of time recording every 5 minutes, which start at different times. So I need a solution that does not select the data based on their row number but on the content (here set).
Example:
example<-data.frame(set = c("set1","set1","set1","set1","set1","set1","set1",
"set2","set2","set2","set2","set2","set2","set2"),
time = c("2022-01-01 05:55", "2022-01-01 06:00", "2022-01-01 06:05", "2022-01-01 06:10", "2022-01-01 06:15", "2022-01-01 06:20", "2022-01-01 06:25",
"2022-01-02 05:58", "2022-01-02 06:03", "2022-01-02 06:08", "2022-01-02 06:13", "2022-01-02 06:18", "2022-01-02 06:23", "2022-01-02 06:28")
)
I want the time 2022-01-01 06:00 to 2022-01-01 06:15 for set1 and the time 22-01-01 05:58 to 2022-01-02 06:13 for set2.
CodePudding user response:
You could create specific filters for each set like this. With dplyr
, you can take advantage of functions like between
, first
, last
, or nth
to make it easier.
library(dplyr)
example %>%
mutate(time = as.POSIXct(time)) %>%
filter((set == "set1" & between(time,
as.POSIXct("2022-01-01 06:00"),
as.POSIXct("2022-01-01 06:15"))) |
(set == "set2" & between(time,
first(time),
as.POSIXct("2022-01-02 06:13"))))
set time
1 set1 2022-01-01 06:00:00
2 set1 2022-01-01 06:05:00
3 set1 2022-01-01 06:10:00
4 set1 2022-01-01 06:15:00
5 set2 2022-01-02 05:58:00
6 set2 2022-01-02 06:03:00
7 set2 2022-01-02 06:08:00
8 set2 2022-01-02 06:13:00
CodePudding user response:
If you need to deal with different times for each sets and there are only 10 cases - I would write as an if else statement.
require(data.table)
setDT(example)
res <- rbindlist(lapply(unique(example[, set]), function(i) {
if (i == "set1") {
example[time > "2022-01-01 06:00" & time < "2022-01-01 06:15", ]
} else if (i == "set2") {
example[time > "2022-01-01 05:58" & time < "2022-01-02 06:13", ]
# else if (i == "setX") ...
} else {
example
}
}),use.names = TRUE, fill = TRUE)