targets: "2019,3,1", "2019,03,01", "2019.03.01", "2019-03-01", " '21/3/1"
year<-c("2019,3,1", "2019,03,01", "2019.03.01", "2019-03-01", " '21/3/1", "2019,3-1", "2019-03=01", "2019,03.01", "2019/03-01", "2019-350-01")
grep("",year,value=T)
I tried
grep("[20 ']19([,./-]0?[3])[,./-](0?[1])$",year,value=T)
but I still have "2019,3-1" "2019,03.01" "2019/03-01"
CodePudding user response:
You can try this:
year<-c("2019,3,1", "2019,03,01", "2019.03.01", "2019-03-01", " '21/3/1", "2019,3-1", "2019-03=01", "2019,03.01", "2019/03-01", "2019-350-01")
grep("\\d{2,4}([,./-])\\d{1,2}\\1{1}\\d{1,2}",year,value=T)
Detail:
\\d{2,4}
: a digit has length range from 2 to 4 respectively year([,./-])
: group character (default group 1).\\d{1,2}
: a digit has length range 1 or 2 respectively month\\1{1}
: same value as captured in Group 1 and has length 1\\d{1,2}
: a digit has length range 1 or 2 respectively day
I usually use regex101 for visualization but it doesn't have for R. There is a small modify to convert from python regex to R regex. For example in python using \d, in R using \\d.
Hope this useful.
CodePudding user response:
Unless you really need a regular expression solution, you could use the ymd()
function from the lubridate
package.
library(lubridate)
ymd(year)
Its output:
[1] "2019-03-01" "2019-03-01" "2019-03-01" "2019-03-01" "2021-03-01"
[6] "2019-03-01" "2019-03-01" "2019-03-01" "2019-03-01" NA
Warning message:
1 failed to parse.
The one that failed to parse is "2019-350-01"
, which clearly can't be directly interpreted as a date.
CodePudding user response:
As others noted, it depends how strict you want to be about what you consider a date, but if you wish to view any symbol between numbers as demarcating between year/month/day and use regex
as.Date(gsub("[^0-9]", "/", year),format = "%Y/%m/%d"))
It converts anything but number to /, thus, gives NA for the one that leads with ' and the one with month 350