Home > Software engineering >  unable to extract date, month and year in three separate columns from dataframe's "Date&qu
unable to extract date, month and year in three separate columns from dataframe's "Date&qu

Time:09-24

I was trying to use

library(dplyr)
library(tidyr)
library(stringr)

# Dataframe has "Date" column and date in the format "dd/mm/yyyy" or "dd/m/yyyy"
df <- data.frame(Date = c("10/1/2001", "15/01/2010", "15/2/2010", "20/02/2010", "25/3/2010", "31/03/2010"))

# extract into three columns
df %>% extract(Date, c("Day", "Month", "Year"), "([^/] ), ([^/] ), ([^)] )")

But above code is returning:

   Day Month Year
1 <NA>  <NA> <NA>
2 <NA>  <NA> <NA>
3 <NA>  <NA> <NA>
4 <NA>  <NA> <NA>
5 <NA>  <NA> <NA>
6 <NA>  <NA> <NA>

How to correctly extract the dates in the result as expected:

   Day Month Year
1 10  1 2010
2 15  1 2010
3 15  2 2010
4 20  2 2010
5 25  3 2010
6 31  3 2010

CodePudding user response:

Might be easier to use separate in this case

df %>% 
  separate("Date", into=c("Day","Month","Year"), sep="/") %>% 
  mutate(Month=str_replace(Month, "^0",""))

That will keep everything as character values. If you want the values to be numeric, use

df %>% 
  separate("Date", into=c("Day","Month","Year"), sep="/", convert=TRUE)

CodePudding user response:

Your regex pattern is off. Use this version:

df %>% extract(Date, c("Day", "Month", "Year"), "(\\d )/(\\d )/(\\d )")

CodePudding user response:

We could use lubridate:

library(lubridate)
library(dplyr)
df %>% 
    mutate(Date = dmy(Date), # if your Date column is character type
           across(Date, funs(year, month, day)))
        Date Date_year Date_month Date_day
1 2001-01-10      2001          1       10
2 2010-01-15      2010          1       15
3 2010-02-15      2010          2       15
4 2010-02-20      2010          2       20
5 2010-03-25      2010          3       25
6 2010-03-31      2010          3       31
  • Related