I have a dataset with temperature data for each day, so i grouped them by date. In the end i have a list with dataframes for each day. Now what i want to do is i want to filter by a range all these dataframes. the filter is the mean value of temperature for that day(dataframe) - 0.5°C. But the problem is that each dataframe in the list has a different mean value (I hope im clear). So i want to filter by the mean values of a column but this mean changes for every dataframe.
How can i solve this problem. I'm an amateur in R so anything is helpful. Thank you in advance
This is a short version of the my list
structure(list(structure(list(Date = structure(c(1646434800,
1646434800, 1646434800, 1646434800, 1646434800, 1646434800, 1646434800,
1646434800, 1646434800, 1646434800), tzone = "", class = c("POSIXct",
"POSIXt")), V4 = c(0.875, 0.5, 0.1875, -0.1875, -0.5, -0.8125,
-1.125, -1.375, -1.625, -1.875)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(Date = structure(c(1646521200,
1646521200, 1646521200, 1646521200, 1646521200, 1646521200, 1646521200,
1646521200, 1646521200, 1646521200, 1646521200), tzone = "", class = c("POSIXct",
"POSIXt")), V4 = c(3.75, 3.75, 3.6875, 3.6875, 3.6875, 3.6875,
3.6875, 3.625, 3.625, 3.625, 3.625)), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(Date = structure(c(1646607600,
1646607600, 1646607600, 1646607600, 1646607600, 1646607600, 1646607600,
1646607600, 1646607600, 1646607600, 1646607600), tzone = "", class = c("POSIXct",
"POSIXt")), V4 = c(3.6875, 3.6875, 3.6875, 3.6875, 3.6875, 3.625,
3.625, 3.625, 3.625, 3.625, 3.625)), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))), ptype = structure(list(Date = structure(numeric(0), tzone = "", class = c("POSIXct",
"POSIXt")), V4 = numeric(0)), class = c("tbl_df", "tbl", "data.frame"
), row.names = integer(0)), class = c("vctrs_list_of", "vctrs_vctr",
"list"))
CodePudding user response:
You can do this in several ways. Suppose mydata
is the list that you provided in the question.
- In
dplyr
you can bind the rows of all the data frames inmydata
first to create a single data frame, and then group them by theDate
, and then apply the filter to each group. The result is a data frame.
do.call(rbind, mydata) %>%
group_by(Date) %>% filter((V4 <= mean(V4) 0.5) &
(V4 >= mean(V4)-0.5))
# A tibble: 25 x 2
# Groups: Date [3]
# Date V4
# <dttm> <dbl>
# 1 2022-03-05 06:00:00 -0.188
# 2 2022-03-05 06:00:00 -0.5
# 3 2022-03-05 06:00:00 -0.812
# 4 2022-03-06 06:00:00 3.75
# 5 2022-03-06 06:00:00 3.75
# 6 2022-03-06 06:00:00 3.69
# 7 2022-03-06 06:00:00 3.69
# 8 2022-03-06 06:00:00 3.69
# 9 2022-03-06 06:00:00 3.69
# 10 2022-03-06 06:00:00 3.69
# ... with 15 more rows
- In R
base
you can define your function that filters a single data frame, and then apply the function tomydata
. The result is a list of data frames.
myfilter <- function(df) {
cond <- (df$V4 <= mean(df$V4 0.5) & (df$V4 >= mean(df$V4) - 0.5))
result <- df[cond,]
return(result)
}
lapply(mydata, myfilter)
# [[1]]
# # A tibble: 3 x 2
# Date V4
# <dttm> <dbl>
# 1 2022-03-05 06:00:00 -0.188
# 2 2022-03-05 06:00:00 -0.5
# 3 2022-03-05 06:00:00 -0.812
#
# [[2]]
# # A tibble: 11 x 2
# Date V4
# <dttm> <dbl>
# 1 2022-03-06 06:00:00 3.75
# 2 2022-03-06 06:00:00 3.75
# 3 2022-03-06 06:00:00 3.69
# 4 2022-03-06 06:00:00 3.69
# 5 2022-03-06 06:00:00 3.69
# 6 2022-03-06 06:00:00 3.69
# 7 2022-03-06 06:00:00 3.69
# 8 2022-03-06 06:00:00 3.62
# 9 2022-03-06 06:00:00 3.62
# 10 2022-03-06 06:00:00 3.62
# 11 2022-03-06 06:00:00 3.62
#
# [[3]]
# # A tibble: 11 x 2
# Date V4
# <dttm> <dbl>
# 1 2022-03-07 06:00:00 3.69
# 2 2022-03-07 06:00:00 3.69
# 3 2022-03-07 06:00:00 3.69
# 4 2022-03-07 06:00:00 3.69
# 5 2022-03-07 06:00:00 3.69
# 6 2022-03-07 06:00:00 3.62
# 7 2022-03-07 06:00:00 3.62
# 8 2022-03-07 06:00:00 3.62
# 9 2022-03-07 06:00:00 3.62
# 10 2022-03-07 06:00:00 3.62
# 11 2022-03-07 06:00:00 3.62