I have a data frame that has over multiple date columns which have been classed as a character. I need to change them to a date or posix class
library(dplyr,lubridate)
date <- c("1/23/2021 12:00:00 AM","1/23/2021 12:00:00 AM","1/23/2021 12:00:00 AM"
,"1/22/2021 12:00:00 AM","1/23/2021 12:00:00 AM","1/23/2021 12:00:00 AM"
,"1/23/2021 12:00:00 AM","1/22/2021 12:00:00 AM","1/20/2021 12:00:00 AM"
,"1/20/2021 12:00:00 AM","1/20/2021 12:00:00 AM","1/22/2021 12:00:00 AM"
,"1/23/2021 12:00:00 AM","1/20/2021 12:00:00 AM","1/20/2021 12:00:00 AM"
,"1/20/2021 12:00:00 AM","1/22/2021 12:00:00 AM","1/23/2021 12:00:00 AM"
,"1/23/2021 12:00:00 AM","1/22/2021 12:00:00 AM","1/23/2021 12:00:00 AM"
,"1/22/2021 12:00:00 AM","1/22/2021 12:00:00 AM","1/23/2021 12:00:00 AM"
,"1/23/2021 12:00:00 AM","1/22/2021 12:00:00 AM","1/22/2021 12:00:00 AM"
,"1/22/2021 12:00:00 AM","1/23/2021 12:00:00 AM","1/23/2021 12:00:00 AM")
a <- rnorm(30)
df <- data.frame(a, date)
# A tibble: 30 × 2
a date
<dbl> <chr>
1 -0.823 1/23/2021 12:00:00 AM
2 -0.312 1/23/2021 12:00:00 AM
3 -1.12 1/23/2021 12:00:00 AM
4 -0.508 1/22/2021 12:00:00 AM
5 0.566 1/23/2021 12:00:00 AM
6 0.704 1/23/2021 12:00:00 AM
7 -0.588 1/23/2021 12:00:00 AM
8 -1.10 1/22/2021 12:00:00 AM
9 -1.10 1/20/2021 12:00:00 AM
10 0.579 1/20/2021 12:00:00 AM
Every approach I have tried has produced NAs. I have used strptime, mdy_hms from lubridate, as.Date from base.
Any direction would be appreciated
CodePudding user response:
?strptime
offers a nice cheatsheet for the abbreviations to be used. This snippet should work in my opinion:
df[["date"]] <- df[["date"]] |> strptime(format = "%m/%d/%Y %I:%M:%S %p")
df[["date"]] |> head()
#> [1] "2021-01-23 CET" "2021-01-23 CET" "2021-01-23 CET" "2021-01-22 CET"
#> [5] "2021-01-23 CET" "2021-01-23 CET"
df[["date"]] |> class()
#> [1] "POSIXlt" "POSIXt"
Edit:
Sorry, I'm not really used to dplyr
yet so I can't really help you with mutate()
. But you needed your date column as a POSIX*
object and that is exactly what you got as far as I understand:
str(df)
#> 'data.frame': 30 obs. of 2 variables:
#> $ a : num -0.0472 0.3096 -0.1849 -0.619 0.1552 ...
#> $ date: POSIXlt, format: "2021-01-23" "2021-01-23" ...
tibble::as_tibble(df)
#> # A tibble: 30 × 2
#> a date
#> <dbl> <dttm>
#> 1 -0.0472 2021-01-23 00:00:00
#> 2 0.310 2021-01-23 00:00:00
#> 3 -0.185 2021-01-23 00:00:00
#> 4 -0.619 2021-01-22 00:00:00
#> 5 0.155 2021-01-23 00:00:00
#> 6 0.275 2021-01-23 00:00:00
#> 7 1.80 2021-01-23 00:00:00
#> 8 0.525 2021-01-22 00:00:00
#> 9 -0.411 2021-01-20 00:00:00
#> 10 0.460 2021-01-20 00:00:00
#> # … with 20 more rows
#> # ℹ Use `print(n = ...)` to see more rows
CodePudding user response:
You can use anydate()
from the library(anytime)
package.
newdate <- anydate(date)
tibble(a,newdate)
A tibble: 30 x 2
a date
<dbl> <date>
1 0.444 2021-01-23
2 -0.0288 2021-01-23
3 -0.607 2021-01-23
4 0.371 2021-01-22
5 0.502 2021-01-23