Home > Software engineering >  R: Error in charToDate(x) : character string is not in a standard unambiguous format
R: Error in charToDate(x) : character string is not in a standard unambiguous format

Time:09-22

I've got a large data set, with numeric, character and a few date variables.

structure(list(DOB = structure(c(18155, 18164, 
18785, 18328, 18314, 18307, 18324), class = "Date"), date_today_ppt_SEEQ = structure(c(18155, 
18164, 18785, 18328, 18314, 18307, 18324), class = "Date"), switching_home = c("Sometimes", 
"Most of the time", "Sometimes", "Sometimes", "Rarely", "Sometimes", 
"Rarely"), single_lang_environm_home = c(80, 0, 100, 75, 95, 
70, 30), dual_lang_environm_home = c(20, 60, 0, 23, 0, 20, 70
), dense_code_sw_home = c(0, 40, 0, 2, 5, 10, 0), between_sentence_sw_home = c("Sometimes", 
"Most of the time", "Sometimes", "Sometimes", "Never", "Sometimes", 
"Rarely"), within_sentence_sw_home = c("Sometimes", "Most of the time", 
"Most of the time", "Rarely", "Rarely", "Rarely", "Sometimes"
)), row.names = c(NA, 7L), class = "data.frame")

I'm trying to just recode character values to numeric across using:

exampledata[exampledata == "Always"] <- 100 
exampledata[exampledata == "Frequently"] <- 75
exampledata[exampledata == "Most of the time"] <- 75 
exampledata[exampledata == "Sometimes"] <- 50 
exampledata[exampledata == "Rarely"] <- 25 
exampledata[exampledata == "Never"] <- 0 

When I try to do that, I get the error:

Error in charToDate(x) : 
  character string is not in a standard unambiguous format

I suspect it has to do with the fact I have date format in my dataset (which comes from an xlsx file), so I've done a few things because I read it might be a problem with the locale or the format of the date.

exampledata$DOB <- openxlsx::convertToDate(exampledata$DOB)
exampledata$DOB <- as.Date(exampledata$DOB, format = "%d/%m/%y")#recorde as DD/MM/YYYY 
exampledata$DOB <- lubridate::ymd(exampledata$DOB, locale = "English")

Someone suggested using mutate, so I also tried:

exampledata <- mutate(exampledata, DOB = as.Date(DOB, "%d/%m/%y"))

When I run:

> class(exampledata$DOB)
[1] "Date" 

It clearly shows up as date. However, when I open my data frame in a window to explore visually and point the cursor to the variable, "column 1: unknown" appears under my cursor, which makes me think it didn't convert to the expected (?) date format.

I read through people's similar problem but am not sure why it shows up as date when I run class and still creates problems. Also, I'm supposedly only addressing values in the character variables, so not sure why it creates a problem at all. Also, people seem to talk about it but nowhere I found what are actually these standard unambiguous values for date?

Finally, as I was creating the reproducible example using dput, I could see my date is converted to number but when I print the column, it prints dates, so I'm really confused:

exampledata$DOB
[1] "2019-09-16" "2019-09-25" "2021-06-07" "2020-03-07" "2020-02-22" "2020-02-15" "2020-03-03"

If anyone has an idea, I'd be glad for some help here.

Finally, here are is my version info (OS is Windows):

> R.version.string
[1] "R version 4.0.3 (2020-10-10)"

CodePudding user response:

Create a named vector and do the replacement

library(dplyr)
nm1 <-   setNames( c(100, 75, 75, 50, 25, 0),
    c("Always", "Frequently", "Most of the time", "Sometimes", "Rarely", "Never"))
exampledata %>%
    mutate(across(where(is.character), ~ nm1[.x]))

-output

    DOB date_today_ppt_SEEQ switching_home single_lang_environm_home dual_lang_environm_home dense_code_sw_home
1 2019-09-16          2019-09-16             50                        80                      20                  0
2 2019-09-25          2019-09-25             75                         0                      60                 40
3 2021-06-07          2021-06-07             50                       100                       0                  0
4 2020-03-07          2020-03-07             50                        75                      23                  2
5 2020-02-22          2020-02-22             25                        95                       0                  5
6 2020-02-15          2020-02-15             50                        70                      20                 10
7 2020-03-03          2020-03-03             25                        30                      70                  0
  between_sentence_sw_home within_sentence_sw_home
1                       50                      50
2                       75                      75
3                       50                      75
4                       50                      25
5                        0                      25
6                       50                      25
7                       25                      50

CodePudding user response:

Here is a dplyr approach with mutate across:

library(dplyr)

df %>% 
  mutate(across(c(switching_home, between_sentence_sw_home, within_sentence_sw_home), ~case_when(. == "Always" ~ 100 ,
                                . == "Frequently" ~ 75, 
                                . == "Most of the time" ~ 75,
                                . == "Sometimes" ~ 50,
                                . == "Rarely" ~ 25,
                                . == "Never" ~ 0,
                                TRUE ~ NA_real_))
         )
         DOB date_today_ppt_SEEQ switching_home single_lang_environm_home dual_lang_environm_home dense_code_sw_home between_sentence_sw_home
1 2019-09-16          2019-09-16             50                        80                      20                  0                       50
2 2019-09-25          2019-09-25             75                         0                      60                 40                       75
3 2021-06-07          2021-06-07             50                       100                       0                  0                       50
4 2020-03-07          2020-03-07             50                        75                      23                  2                       50
5 2020-02-22          2020-02-22             25                        95                       0                  5                        0
6 2020-02-15          2020-02-15             50                        70                      20                 10                       50
7 2020-03-03          2020-03-03             25                        30                      70                  0                       25
  within_sentence_sw_home
1                      50
2                      75
3                      75
4                      25
5                      25
6                      25
7                      50

  •  Tags:  
  • r
  • Related