Home > Enterprise >  Changing a column from character type to date type produces NAs
Changing a column from character type to date type produces NAs

Time:08-25

I have a small dataset with less than 200 rows.

hiv <- data.frame(id=1:6, ENTERED.DATE=c("Mar.03.2021", "Oct.16.2020", "Oct.19.2020", "Mar.09.2021", "Mar.10.2021", "Nov.16.2020"))

The dates are not chronological but recorded haphazardly. The file is .xlsx . Changing the date column from character type to date type produces NAs

library(readxl)
hiv <- read_xlsx("hiv.xlsx", sheet = "Sheet1", col_names = TRUE, na = "NA")
#> New names:
#> * `` -> `...25`
names(hiv) <- make.names(names(hiv))
date <- as.Date(as.character(hiv$ENTERED.DATE), "%d%m%Y")
head(date)
#> [1] NA NA NA NA NA NA

I have read some of the solutions on stackoverflow but could not resolve the issue.

CodePudding user response:

You'll need to use the right date format for your data: %d%m%Y is date month year, but your data has the format monthabb.day.year, which would require %h.%d.%Y:

as.Date(c("Mar.03.2021", "Oct.16.2020", "Oct.19.2020", "Mar.09.2021", "Mar.10.2021", "Nov.16.2020"), "%h.%d.%Y")

Or in your data frame:

date <- as.Date(hiv$ENTERED.DATE, "%h.%d.%Y")

Output:

"2021-03-03" "2020-10-16" "2020-10-19" "2021-03-09" "2021-03-10" "2020-11-16"

CodePudding user response:

In the R Documentation for as.Date they already describe this problem and provide a solution.

First you have to use the right format for your date: %b.%m.%Y and then you have to set the C locale as described below:

hiv <- data.frame(id=1:6, ENTERED_DATE=c("Mar.03.2021", "Oct.16.2020", "Oct.19.2020", "Mar.09.2021", "Mar.10.2021", "Nov.16.2020"))

Sys.setlocale("LC_TIME","C")
#> [1] "C"

as.Date(hiv$ENTERED_DATE, "%b.%d.%Y")
#> [1] "2021-03-03" "2020-10-16" "2020-10-19" "2021-03-09" "2021-03-10"
#> [6] "2020-11-16"

Created on 2022-08-24 with reprex v2.0.2

  •  Tags:  
  • r
  • Related