I am trying to plot water elevation data vs river stage and precipitation data. My Water elevation data is reported on an hourly basis while I only have daily precipitation and river stage values. Because I have them in the same dataframe, I placed all the daily measurements at the same time of 12:00:00 each day that way they would be presented in the middle of the day.
My data frame is presented as such:
Date River Rain Well 1
1/1/2021 00:00 NA NA 422
1/1/2021 01:00 NA NA 421.8
1/1/2021 02:00 NA NA 421.7
1/1/2021 03:00 NA NA 421
1/1/2021 04:00 NA NA 421.3
1/1/2021 05:00 NA NA 421
1/1/2021 06:00 NA NA 421
1/1/2021 07:00 NA NA 420.7
1/1/2021 08:00 NA NA 420.6
1/1/2021 09:00 NA NA 420.9
1/1/2021 10:00 NA NA 421.4
1/1/2021 11:00 NA NA 421.4
1/1/2021 12:00 430 1.5 421
My issue is arising from the format of the data in the Date column which is reported as is from excel in YYYY-MM-DD HH-MM-SS format. Upon initially uploading the excel file sapply & lapply report it as a numeric in the same format as the excel.
However if I convert it using as.Date it returns it as a numeric in the format YYYY-MM-DD creating 24 YYYY-MM-DD values in my date column for each day. I was using the following code to transform it:
df <- df %>% transform(Date = as.Date(Date))
I have also tried to use:
df <- df %>% ymd_hms(Date)
However this gives an error and replaces all values in the Date column with NA.
When I plot the data after I use as.Date it only reports a single measurement for each day instead of the hourly data. However when I don't transform the Date and leave it as is, I get the error:
Error: Invalid input: date_trans works with objects of class Date only
All other data is in numeric format. Really appreciate any kind of help.
CodePudding user response:
The issue is that converting using as.Date
will drop the hours. To keep the hours use as.POSIXct
. Also, your dates are not in YYYY-MM-DD
format. To account for that you have to specify the format
. But I'm not sure whether this will fix the issue with your plot.
library(dplyr)
df %>%
transform(Date = as.POSIXct(Date, format = "%d/%m/%Y %H:%M"))
#> Date River Rain Well.1
#> 1 2021-01-01 00:00:00 NA NA 422.0
#> 2 2021-01-01 01:00:00 NA NA 421.8
#> 3 2021-01-01 02:00:00 NA NA 421.7
#> 4 2021-01-01 03:00:00 NA NA 421.0
#> 5 2021-01-01 04:00:00 NA NA 421.3
#> 6 2021-01-01 05:00:00 NA NA 421.0
#> 7 2021-01-01 06:00:00 NA NA 421.0
#> 8 2021-01-01 07:00:00 NA NA 420.7
#> 9 2021-01-01 08:00:00 NA NA 420.6
#> 10 2021-01-01 09:00:00 NA NA 420.9
#> 11 2021-01-01 10:00:00 NA NA 421.4
#> 12 2021-01-01 11:00:00 NA NA 421.4
#> 13 2021-01-01 12:00:00 430 1.5 421.0
DATA
df <- structure(list(Date = c(
"1/1/2021 00:00", "1/1/2021 01:00", "1/1/2021 02:00",
"1/1/2021 03:00", "1/1/2021 04:00", "1/1/2021 05:00", "1/1/2021 06:00",
"1/1/2021 07:00", "1/1/2021 08:00", "1/1/2021 09:00", "1/1/2021 10:00",
"1/1/2021 11:00", "1/1/2021 12:00"
), River = c(
NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 430L
), Rain = c(
NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 1.5
), `Well 1` = c(
422, 421.8,
421.7, 421, 421.3, 421, 421, 420.7, 420.6, 420.9, 421.4, 421.4,
421
)), class = "data.frame", row.names = c(NA, -13L))