I'm new to R and trying to replicate my current excel reports in R. I have a multi-column csv file with a column containing a timestamp in the format of "4/18/2022, 2:15 PM" (copied directly from the datafile). Ultimately I need to extract the week number and day number for each row in the csv.
I've gotten this far in converting the char to POSIX.
datetest = ("4/18/2022, 12:53:22 AM")
datetest1 <- mdy_hms(datetest)
print(datetest1)
sapply(datetest1, class)
which produces
> datetest = ("4/18/2022, 12:53:22 AM")
> datetest1 <- mdy_hms(datetest)
> print(datetest1)
[1] "2022-04-18 00:53:22 UTC"
> sapply(datetest1, class)
[,1]
[1,] "POSIXct"
[2,] "POSIXt"
But my problem is when I try to convert the entire column
#import CSV
RawData <- read.csv("data.csv",header = TRUE, ",")
#convert TS to POSIX
#column header is Date.Time.Opened
escRawData[['Date.Time.Opened']] <- mdy_hms(escRawData[['Date.Time.Opened']])
which gives me this
> escRawData <-read.csv("report1650637729324.csv",header = TRUE, ",")
> #convert TS to POSIX
> escRawData[['Date.Time.Opened']] <- mdy_hms(escRawData[['Date.Time.Opened']])
Warning message:
All formats failed to parse. No formats found.
which has cleared all the data from the dataframe. I used sapply to check the datatype in the CSV and I know that the data is "character."
I've read/watched a number of tutorials describing lubridate and the POSIXlt commands all giving more or less the same error of "no formats found" so obviously, I am missing something basic. Any help would be appreciated.
CodePudding user response:
With your current code, one faulty date entry (format-wise or other) breaks the whole column's conversion. You could inspect the raw data beforehand, or transform them rowwise and check for failed conversions (NAs) like:
library(dplyr)
library(lubridate)
converted_data <-
escRawData %>%
rowwise %>% ## important
mutate(date_converted = mdy_hms(Date.Time.Opened))
## which rows went wrong?
converted_data %>%
filter(is.na(date_converted))
CodePudding user response:
Ok Everyone, I found out what I was doing wrong. @KU99's comment helped me figure it out. My actual data file did not include seconds in the time stamp. When I changed to using
escRawData[['Date.Time.Opened']] <- mdy_hm(escRawData[['Date.Time.Opened']])
it worked perfectly.