Home > Software engineering >  How to calculate the total minutes when values were greater than 5?
How to calculate the total minutes when values were greater than 5?

Time:12-16

I have some time series data where values are given every 15 minutes. For this dataset we can assume that the values do not change within that 15 minute period; therefore if the value was 1 at 8:15am, and 2 at 8:30am, then we assume that from 8:15-8:30 the value was 1, and from 8:30-8:45 the value was 2.

Now I want to calculate the monthly total of minutes when values were greater than 5.

My dataframe looks something like this (Except with 15 columns of values).

# Create a, b, c, d variables
a <- c(
  "06-25-2021 08:00:00 AM",
  "06-25-2021 08:15:00 AM",
  "06-25-2021 08:30:00 AM",
  "06-25-2021 08:45:00 AM",
  "07-25-2021 08:45:00 AM",
  "07-25-2021 09:00:00 AM",
  "08-25-2021 08:45:00 AM",
  "08-25-2021 09:00:00 AM",
  "09-25-2021 09:15:00 AM",
  "09-25-2021 09:30:00 AM"
)
b = c(4, 5, 8, NA, 4, 5, NA, 7, 7, 6)
c = c(6, 10, 8, NA, 8, 5, NA, 8, 7, 2)
d = c(1, 3 ,NA, 6, 4, 8, 2, 4, NA, 10)

df =
  tibble(a, b, c, d) 

df$a = as.POSIXlt(df$a, format = "%m-%d-%Y%H:%M:%S", tz = 'EST')

but I want it to look like this

Name = c("b", "c", "d")
June = c(15, 45, 15 )
Jul = c(NA, 15, 15)
Aug = c(15, 15, NA)
Sept = c(45, 30, 15)

df_2 = tibble (Name,June,Jul,Aug,Sept)

I'm not sure how to sum and filter for this when it's a time series. Anyone have any suggestions?

CodePudding user response:

Based on the description - perhaps we replace the values in columns 'b' to 'd' that are less than or equal to 5 will NA, reshape to 'long' format with pivot_longer, get the month from the 'a' column and reshape back to 'wide' with pivot_wider

library(dplyr)
library(tidyr)
df %>% 
   mutate(across(2:4, ~ replace(., . <=5, NA))) %>% 
   pivot_longer(cols = b:d, names_to = 'Name', values_drop_na = TRUE) %>%
   mutate(a = format(a, '%b')) %>%
   pivot_wider(names_from = a, values_from = value, 
    values_fill = 0, values_fn = list(value = function(x) length(x) * 15))

CodePudding user response:

Using aggregate, replace everything less or equal to 5 with zero and calculate sums. The rest is bringing into shape.

aggregate(cbind(b, c, d) ~ a$mon, df, \(x) sum(replace(x, x <= 5, 0), na.rm=T)) |>
  t() |> as.data.frame() |>
  (\(x) setNames(x, month.abb[unlist(x[1, ])])[-1, ])() |>
  (\(x) cbind(Name=rownames(x), x))() |> `rownames<-`(NULL)  ## optional
#   Name May Jun Jul Aug
# 1    b   0   0   7   6
# 2    c  16   8   8   0
# 3    d   0   8   0  10

Data

df <- structure(list(a = structure(list(sec = c(0, 0, 0, 0, 0, 0, 0, 
0, 0, 0), min = c(0L, 15L, 30L, 45L, 45L, 0L, 45L, 0L, 15L, 30L
), hour = c(8L, 8L, 8L, 8L, 8L, 9L, 8L, 9L, 9L, 9L), mday = c(25L, 
25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L), mon = c(5L, 5L, 
5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L), year = c(121L, 121L, 121L, 121L, 
121L, 121L, 121L, 121L, 121L, 121L), wday = c(5L, 5L, 5L, 5L, 
0L, 0L, 3L, 3L, 6L, 6L), yday = c(175L, 175L, 175L, 175L, 205L, 
205L, 236L, 236L, 267L, 267L), isdst = c(0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L), zone = c("EST", "EST", "EST", "EST", "EST", 
"EST", "EST", "EST", "EST", "EST"), gmtoff = c(-18000L, -18000L, 
-18000L, -18000L, -18000L, -18000L, -18000L, -18000L, -18000L, 
-18000L)), class = c("POSIXlt", "POSIXt"), tzone = c("EST", "EST", 
"EST")), b = c(4, 5, 8, NA, 4, 5, NA, 7, 7, 6), c = c(6, 10, 
8, NA, 8, 5, NA, 8, 7, 2), d = c(1, 3, NA, 6, 4, 8, 2, 4, NA, 
10)), row.names = c(NA, -10L), class = "data.frame")


# [1] "R version 4.1.2 (2021-11-01)"
  • Related