Home > Software engineering >  how to filter all the elements of the list in R? but the filter will use the mean of each element so
how to filter all the elements of the list in R? but the filter will use the mean of each element so

Time:04-09

I have a dataset with temperature data for each day, so i grouped them by date. In the end i have a list with dataframes for each day. Now what i want to do is i want to filter by a range all these dataframes. the filter is the mean value of temperature for that day(dataframe) - 0.5°C. But the problem is that each dataframe in the list has a different mean value (I hope im clear). So i want to filter by the mean values of a column but this mean changes for every dataframe.

How can i solve this problem. I'm an amateur in R so anything is helpful. Thank you in advance

This is a short version of the my list

structure(list(structure(list(Date = structure(c(1646434800, 
1646434800, 1646434800, 1646434800, 1646434800, 1646434800, 1646434800, 
1646434800, 1646434800, 1646434800), tzone = "", class = c("POSIXct", 
"POSIXt")), V4 = c(0.875, 0.5, 0.1875, -0.1875, -0.5, -0.8125, 
-1.125, -1.375, -1.625, -1.875)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame")), structure(list(Date = structure(c(1646521200, 
1646521200, 1646521200, 1646521200, 1646521200, 1646521200, 1646521200, 
1646521200, 1646521200, 1646521200, 1646521200), tzone = "", class = c("POSIXct", 
"POSIXt")), V4 = c(3.75, 3.75, 3.6875, 3.6875, 3.6875, 3.6875, 
3.6875, 3.625, 3.625, 3.625, 3.625)), row.names = c(NA, -11L), class = c("tbl_df", 
"tbl", "data.frame")), structure(list(Date = structure(c(1646607600, 
1646607600, 1646607600, 1646607600, 1646607600, 1646607600, 1646607600, 
1646607600, 1646607600, 1646607600, 1646607600), tzone = "", class = c("POSIXct", 
"POSIXt")), V4 = c(3.6875, 3.6875, 3.6875, 3.6875, 3.6875, 3.625, 
3.625, 3.625, 3.625, 3.625, 3.625)), row.names = c(NA, -11L), class = c("tbl_df", 
"tbl", "data.frame"))), ptype = structure(list(Date = structure(numeric(0), tzone = "", class = c("POSIXct", 
"POSIXt")), V4 = numeric(0)), class = c("tbl_df", "tbl", "data.frame"
), row.names = integer(0)), class = c("vctrs_list_of", "vctrs_vctr", 
"list"))

CodePudding user response:

You can do this in several ways. Suppose mydata is the list that you provided in the question.

  1. In dplyr you can bind the rows of all the data frames in mydata first to create a single data frame, and then group them by the Date, and then apply the filter to each group. The result is a data frame.
do.call(rbind, mydata) %>% 
        group_by(Date) %>% filter((V4 <= mean(V4)   0.5) & 
                                  (V4 >= mean(V4)-0.5)) 

# A tibble: 25 x 2
# Groups:   Date [3]
# Date                    V4
# <dttm>               <dbl>
#   1 2022-03-05 06:00:00 -0.188
# 2 2022-03-05 06:00:00 -0.5  
# 3 2022-03-05 06:00:00 -0.812
# 4 2022-03-06 06:00:00  3.75 
# 5 2022-03-06 06:00:00  3.75 
# 6 2022-03-06 06:00:00  3.69 
# 7 2022-03-06 06:00:00  3.69 
# 8 2022-03-06 06:00:00  3.69 
# 9 2022-03-06 06:00:00  3.69 
# 10 2022-03-06 06:00:00  3.69 
# ... with 15 more rows
  1. In R base you can define your function that filters a single data frame, and then apply the function to mydata. The result is a list of data frames.
myfilter <- function(df) {
  cond <- (df$V4 <= mean(df$V4   0.5) & (df$V4 >= mean(df$V4) - 0.5))
  result <- df[cond,]
  return(result)
}

lapply(mydata, myfilter)
# [[1]]
# # A tibble: 3 x 2
# Date                    V4
# <dttm>               <dbl>
#   1 2022-03-05 06:00:00 -0.188
# 2 2022-03-05 06:00:00 -0.5  
# 3 2022-03-05 06:00:00 -0.812
# 
# [[2]]
# # A tibble: 11 x 2
# Date                   V4
# <dttm>              <dbl>
#   1 2022-03-06 06:00:00  3.75
# 2 2022-03-06 06:00:00  3.75
# 3 2022-03-06 06:00:00  3.69
# 4 2022-03-06 06:00:00  3.69
# 5 2022-03-06 06:00:00  3.69
# 6 2022-03-06 06:00:00  3.69
# 7 2022-03-06 06:00:00  3.69
# 8 2022-03-06 06:00:00  3.62
# 9 2022-03-06 06:00:00  3.62
# 10 2022-03-06 06:00:00  3.62
# 11 2022-03-06 06:00:00  3.62
# 
# [[3]]
# # A tibble: 11 x 2
# Date                   V4
# <dttm>              <dbl>
#   1 2022-03-07 06:00:00  3.69
# 2 2022-03-07 06:00:00  3.69
# 3 2022-03-07 06:00:00  3.69
# 4 2022-03-07 06:00:00  3.69
# 5 2022-03-07 06:00:00  3.69
# 6 2022-03-07 06:00:00  3.62
# 7 2022-03-07 06:00:00  3.62
# 8 2022-03-07 06:00:00  3.62
# 9 2022-03-07 06:00:00  3.62
# 10 2022-03-07 06:00:00  3.62
# 11 2022-03-07 06:00:00  3.62
  • Related