Home > other >  Incorrect day and year format after extracting "MM-DD-YYY" variable
Incorrect day and year format after extracting "MM-DD-YYY" variable

Time:04-11

I am trying to generate day, month, and year variables based on the string values of a "date" variable, which is formatted as "27-02-2012" or "DD-MM-YYYY".

#Loading packages
library(tidyverse)
library(readxl)
library(writexl)
library(stringr)
library(textclean)
library(lubridate)
#library(zoo)

My variables are stored as follows:

sapply(data_corpus, class)
    post        date    username 
"character" "character" "character"

To extract and generate separate variables for day, month, and year, I ran this:

#Converting date variable
#data_corpus$date <- as_date(data_corpus$date)

But this turns all of my values in the "date" variable into NAs. So I also tried running this, which works well with month.

#Creating day, month, year variables 
data_corpus$day <- day(data_corpus$date)
data_corpus$month <- month(data_corpus$date)
data_corpus$year <- year(data_corpus$date)

However, a date like "27-02-2012" would be extracted as follows, which means that month is extracted correctly, but "year" was extracted from the day values in the original "date" variable, and I am not sure how's the value for the "day" been generated?

   "date"        day   month    year
"27-02-2012"      20    2        27

Here is how the variables are stored after creating the 3 variables above:

sapply(data_corpus, class)
      post        date    username         day       month        year 
"character" "character" "character"   "integer"   "numeric"   "numeric" 

CodePudding user response:

We can use

library(lubridate)
data_corpus$date <- dmy(data_corpus$date)

Or with base R

data_corpus$date <- as.Date(data_corpus$date, "%d-%m-%Y")
  • Related