I need your support while working with dates. While importing an .xls file, the column of dates was correctly converted into numbers by R. Unfortunately some dates are still there in the format: dd/mm/yyyy or d/mm/yyyy or dd/mm/yy. Probably this results from different settings of different os. I don't know. Is there a way to manage this?
Thank you in advance
my_data <- read_excel("my_file.xls")
born_date 18520 30859 16/04/1972 26612 30291 24435 11/02/1964 26/09/1971 18427 23688
Original_dates
14/9/1950
26/6/1984
16/04/1972
9/11/1972
6/12/1982
24/11/1966
11/02/1964
26/09/1971
13/6/1950
CodePudding user response:
Here is one way how we could solve it:
First we define the numeric values only by exlcuden those containing the string /
.
Then we use excel_numeric_to_date
function from janitor
package.
Finally with coalesce
we combine both:
library(dplyr)
library(janitor)
library(lubridate)
df %>%
mutate(x = ifelse(str_detect(born_date, '\\/'), NA_real_, born_date),
x = excel_numeric_to_date(as.numeric(as.character(x)), date_system = "modern"),
born_date = dmy(born_date)) %>%
mutate(born_date = coalesce(born_date, x), .keep="unused")
born_date
1 1950-09-14
2 1984-06-26
3 1972-04-16
4 1972-11-09
5 1982-12-06
6 1966-11-24
7 1964-02-11
8 1971-09-26
9 1950-06-13
10 1964-11-07
data:
df <- structure(list(born_date = c("18520", "30859", "16/04/1972",
"26612", "30291", "24435", "11/02/1964", "26/09/1971", "18427",
"23688")), class = "data.frame", row.names = c(NA, -10L))
CodePudding user response:
This translates the two types of dates. Each returns an NA for those elements not of that type. Then we use coalesce to combine them. This only needs dplyr and no warnings are produced.
library(dplyr)
my_data %>%
mutate(born_date = coalesce(
as.Date(born_date, "%d/%m/%Y"),
as.Date(as.numeric(ifelse(grepl("/",born_date),NA,born_date)), o="1899-12-30"))
)
## born_date
## 1 1950-09-14
## 2 1984-06-26
## 3 1972-04-16
## 4 1972-11-09
## 5 1982-12-06
## 6 1966-11-24
## 7 1964-02-11
## 8 1971-09-26
## 9 1950-06-13
## 10 1964-11-07
HEre is a base R version.
my_data |>
transform(born_date = pmin(
as.Date(born_date, "%d/%m/%Y"),
as.Date(as.numeric(ifelse(grepl("/",born_date),NA,born_date)), o="1899-12-30"),
na.rm = TRUE)
)
Note
The input in reproducible form.
my_data <-
structure(list(born_date = c("18520", "30859", "16/04/1972",
"26612", "30291", "24435", "11/02/1964", "26/09/1971", "18427",
"23688")), class = "data.frame", row.names = c(NA, -10L))