I am trying to generate day, month, and year variables based on the string values of a "date" variable, which is formatted as "27-02-2012" or "DD-MM-YYYY".
#Loading packages
library(tidyverse)
library(readxl)
library(writexl)
library(stringr)
library(textclean)
library(lubridate)
#library(zoo)
My variables are stored as follows:
sapply(data_corpus, class)
post date username
"character" "character" "character"
To extract and generate separate variables for day, month, and year, I ran this:
#Converting date variable
#data_corpus$date <- as_date(data_corpus$date)
But this turns all of my values in the "date" variable into NAs. So I also tried running this, which works well with month.
#Creating day, month, year variables
data_corpus$day <- day(data_corpus$date)
data_corpus$month <- month(data_corpus$date)
data_corpus$year <- year(data_corpus$date)
However, a date like "27-02-2012" would be extracted as follows, which means that month is extracted correctly, but "year" was extracted from the day values in the original "date" variable, and I am not sure how's the value for the "day" been generated?
"date" day month year
"27-02-2012" 20 2 27
Here is how the variables are stored after creating the 3 variables above:
sapply(data_corpus, class)
post date username day month year
"character" "character" "character" "integer" "numeric" "numeric"
CodePudding user response:
We can use
library(lubridate)
data_corpus$date <- dmy(data_corpus$date)
Or with base R
data_corpus$date <- as.Date(data_corpus$date, "%d-%m-%Y")