I want to import my data from excel to R. In the excel file I have columns with NA values, incomplete and full dates i.e. it says 2022-05-00 in one cell and in another cell on that same column I have a date that says 2020-04-24 which is a full date. I have several columns with this specific issue.
However I believe I can get around this issue by importing all date columns as characters into R, but how do I do that? Meaning when I do import I want all my date columns to be character and not Date-format to avoid this issue.
When I try to import them as numeric that will produce NA values and when I try to import them as character some will turn into numeric values and the incompletes will keep their "normal" formate i.e. 2022-05-00 on that same column.
I am using Rstudios default import library "readxl" right now.
CodePudding user response:
One solution could be:
After importing from excel we could use parse_date
function from
parsedate
package: See ?parsedate
library(dplyr)
df <- tibble(dates = c("2022-05-00", "2020-04-24"))
library(parsedate)
df %>%
mutate(dates = parse_date(dates))
dates
<dttm>
1 2022-01-05 00:00:00
2 2020-04-24 00:00:00
CodePudding user response:
If you want to force a specific format for imported variables, you can use the col_types
option in read_excel
. Say you have a file with a numeric ID variable you want to keep numeric, and three columns with dates, you can go with:
library(readxl)
df <- read_excel("myfile.xlsx", col_types = c("numeric", "text", "text", "text"))
If you want to import all your variables as text, you can just use col_types = "text"
, as it is recycled.
That said, if you want to use those variables as actual dates, you will still have the problem that incomplete dates will be set to missing or may be parsed incorrectly, if that's your problem you should post a specific question for that issue.