I am trying to select values by dates (day) from an imported excel sheet.
Convert to R date
Date <-as.Date(Df$Calendar.day, origin = "1899-12-30")
This yields 24 entries per day with the correct dates. I aggregate the values for each day
Df_agg <- aggregate(.~Date, Df, sum)
This yields the correct dates, and the sum of values for each day, as desired.
Date
2022-01-01
2022-01-02
.
.
.
2022-12-31
However, when I try to filter for a certain date, this works only for the first date (2022-01-01). Other specific dates do not work.
Df_agg %>%
select(Date) %>%
filter(Date== as.Date("2022-05-08"))
[1] Date
<0 rows> (or 0-length row.names)
However, larger/smaller than works:
Df_agg %>%
select(Date) %>%
filter(Date> as.Date("2022-05-08"))
Date
1 2022-05-08
2 2022-05-09
3 2022-05-10
4 2022-05-11
5 2022-05-12
6 2022-05-13
7 2022-05-14
8 2022-05-15
9 2022-05-16
What can be the problem?
Sample Df_agg:
Date | value |
---|---|
2022-01-01 | 135.2 |
2022-01-02 | 130.6 |
2022-01-03 | 121.0 |
dput(Df_aggr) (Example)
structure(list(Date = structure(c(18993, 18994.0416666666, 18995.0416666665,
18996.0416666665, 18997.0416666664, 18998.0416666664, 18999.0416666663,
19000.0416666663, 19001.0416666662, 19002.0416666661, 19003.0416666661,
class = "Date"), value = c(135.192291666667,
130.261541666667, 121.005041666667, 121.005041666667, 121.005041666667,
121.005041666667, 121.771375, 131.027875, 130.261541666667, 121.005041666667,
121.771375, 131.027875, 130.261541666667, 121.005041666667, 121.005041666667,
121.005041666667, 121.005041666667, 121.771375, 126.863458333333
)), row.names = c(NA, -365L), class = "data.frame")
CodePudding user response:
I found a solution, using the "Hour.of.Day" column instead. (E.g., 2022-05-08 00:00:00)
Df[["Hour.of.day"]] <-
as.POSIXct(Df[["Hour.of.day"]] * (60*60*24)
, origin="2022-01-01"
, tz="Europe/Stockholm")
Df<- Df %>%
rename("Hour" = "Hour.of.day")
Date_Input <- '2022-05-08'
Df %>%filter(as.Date(Hour) == as.Date(Date_Input))
This results in values for all 24 hours of 22-05-08. Since it is not possible to use the hours to aggregate, I create an additional column ("dummy") with value 1 for all rows:
dfdate <- aggregate(.~dummy, Df, sum)
Perhaps not the most elegant solution, but it works and has the added advantage of being able to select intervals of several days with different hours.