Home > Enterprise >  Conditionally formatting a column with mutate and regex in R
Conditionally formatting a column with mutate and regex in R

Time:11-24

I'm brand new in R and programming in general. I have a column containing a list of dates. Some are in the "01 January 2020" format, some have only month and year (ie "January 2020" only). I want to mutate them to a new field where I add a 01 in front of all the dates that are in the month year format, and then I will use lubridate to process it into dates

This is what I've tried. I'm trying to extract the first character of the Date column. If it is an upper case letter, then I will append "01" to it. I am using the tinyverse package including dplyr

df %>% mutate(new_date = ifelse(str_sub(Date, start = 1, end = 1)== "[:upper:]"), paste('01', Date, sep = ' '), new_date = Date)

I'm getting the error message "no is missing", but I thought I have included new_date = Date to keep the current formatting.

Thank you for your help!

CodePudding user response:

This can be done in many ways.

base R using lookahead and backreference:

sub("(^)(?=[A-Za-z] )", "\\101 ", date, perl = TRUE)
[1] "01 January 2020"  "01 January 2020"  "12 February 1999" "01 March 2033"

base R using only backreference:

sub("(^[A-Za-z] )", "01 \\1", date, perl = TRUE)

dplyr and stringr using the same logic:

library(dplyr)
library(stringr)

data.frame(date) %>%
  mutate(date = str_replace(date, "(^)(?=[A-Za-z] )", "\\101 "))

If you do insist on using ifelse:

library(dplyr)
library(stringr)

data.frame(date) %>% 
  mutate(date = ifelse(str_detect(date, "^[:upper:]"),
                       sub("^", "01 ", date),
                       date))

Data:

date <- c("01 January 2020","January 2020", "12 February 1999", "March 2033")

CodePudding user response:

Here is a non-regex option where we convert to Date class and format it

library(parsedate)
format(parse_date(date), '%d %B %Y')
[1] "01 January 2020"  "01 January 2020"  "12 February 1999" "01 March 2033"  

data

date <- c("01 January 2020","January 2020", "12 February 1999", "March 2033")
  • Related