I've got a large data set, with numeric, character and a few date variables.
structure(list(DOB = structure(c(18155, 18164,
18785, 18328, 18314, 18307, 18324), class = "Date"), date_today_ppt_SEEQ = structure(c(18155,
18164, 18785, 18328, 18314, 18307, 18324), class = "Date"), switching_home = c("Sometimes",
"Most of the time", "Sometimes", "Sometimes", "Rarely", "Sometimes",
"Rarely"), single_lang_environm_home = c(80, 0, 100, 75, 95,
70, 30), dual_lang_environm_home = c(20, 60, 0, 23, 0, 20, 70
), dense_code_sw_home = c(0, 40, 0, 2, 5, 10, 0), between_sentence_sw_home = c("Sometimes",
"Most of the time", "Sometimes", "Sometimes", "Never", "Sometimes",
"Rarely"), within_sentence_sw_home = c("Sometimes", "Most of the time",
"Most of the time", "Rarely", "Rarely", "Rarely", "Sometimes"
)), row.names = c(NA, 7L), class = "data.frame")
I'm trying to just recode character values to numeric across using:
exampledata[exampledata == "Always"] <- 100
exampledata[exampledata == "Frequently"] <- 75
exampledata[exampledata == "Most of the time"] <- 75
exampledata[exampledata == "Sometimes"] <- 50
exampledata[exampledata == "Rarely"] <- 25
exampledata[exampledata == "Never"] <- 0
When I try to do that, I get the error:
Error in charToDate(x) :
character string is not in a standard unambiguous format
I suspect it has to do with the fact I have date format in my dataset (which comes from an xlsx file), so I've done a few things because I read it might be a problem with the locale or the format of the date.
exampledata$DOB <- openxlsx::convertToDate(exampledata$DOB)
exampledata$DOB <- as.Date(exampledata$DOB, format = "%d/%m/%y")#recorde as DD/MM/YYYY
exampledata$DOB <- lubridate::ymd(exampledata$DOB, locale = "English")
Someone suggested using mutate
, so I also tried:
exampledata <- mutate(exampledata, DOB = as.Date(DOB, "%d/%m/%y"))
When I run:
> class(exampledata$DOB)
[1] "Date"
It clearly shows up as date. However, when I open my data frame in a window to explore visually and point the cursor to the variable, "column 1: unknown" appears under my cursor, which makes me think it didn't convert to the expected (?) date format.
I read through people's similar problem but am not sure why it shows up as date when I run class
and still creates problems. Also, I'm supposedly only addressing values in the character variables, so not sure why it creates a problem at all. Also, people seem to talk about it but nowhere I found what are actually these standard unambiguous values for date?
Finally, as I was creating the reproducible example using dput
, I could see my date is converted to number but when I print the column, it prints dates, so I'm really confused:
exampledata$DOB
[1] "2019-09-16" "2019-09-25" "2021-06-07" "2020-03-07" "2020-02-22" "2020-02-15" "2020-03-03"
If anyone has an idea, I'd be glad for some help here.
Finally, here are is my version info (OS is Windows):
> R.version.string
[1] "R version 4.0.3 (2020-10-10)"
CodePudding user response:
Create a named vector and do the replacement
library(dplyr)
nm1 <- setNames( c(100, 75, 75, 50, 25, 0),
c("Always", "Frequently", "Most of the time", "Sometimes", "Rarely", "Never"))
exampledata %>%
mutate(across(where(is.character), ~ nm1[.x]))
-output
DOB date_today_ppt_SEEQ switching_home single_lang_environm_home dual_lang_environm_home dense_code_sw_home
1 2019-09-16 2019-09-16 50 80 20 0
2 2019-09-25 2019-09-25 75 0 60 40
3 2021-06-07 2021-06-07 50 100 0 0
4 2020-03-07 2020-03-07 50 75 23 2
5 2020-02-22 2020-02-22 25 95 0 5
6 2020-02-15 2020-02-15 50 70 20 10
7 2020-03-03 2020-03-03 25 30 70 0
between_sentence_sw_home within_sentence_sw_home
1 50 50
2 75 75
3 50 75
4 50 25
5 0 25
6 50 25
7 25 50
CodePudding user response:
Here is a dplyr approach with mutate across:
library(dplyr)
df %>%
mutate(across(c(switching_home, between_sentence_sw_home, within_sentence_sw_home), ~case_when(. == "Always" ~ 100 ,
. == "Frequently" ~ 75,
. == "Most of the time" ~ 75,
. == "Sometimes" ~ 50,
. == "Rarely" ~ 25,
. == "Never" ~ 0,
TRUE ~ NA_real_))
)
DOB date_today_ppt_SEEQ switching_home single_lang_environm_home dual_lang_environm_home dense_code_sw_home between_sentence_sw_home
1 2019-09-16 2019-09-16 50 80 20 0 50
2 2019-09-25 2019-09-25 75 0 60 40 75
3 2021-06-07 2021-06-07 50 100 0 0 50
4 2020-03-07 2020-03-07 50 75 23 2 50
5 2020-02-22 2020-02-22 25 95 0 5 0
6 2020-02-15 2020-02-15 50 70 20 10 50
7 2020-03-03 2020-03-03 25 30 70 0 25
within_sentence_sw_home
1 50
2 75
3 75
4 25
5 25
6 25
7 50