I have a vector of characters that looks like the table below, I would like to extract the dates from them and convert them as.Date
. For example, row one would be 09-11-2021. The last number in the string is the number of columns and not part of the date.
<chr>
1 By Leigh-Ann Butler, Shannon Cobb, Michael R. DonaldsonNov 9, 20213 Comments
2 By Leigh-Ann Butler, Shannon Cobb, Michael R. DonaldsonNov 8, 20212 Comments
3 By Rick AndersonNov 4, 202114 Comments
4 By Victoria Ficarra, Rob JohnsonNov 3, 20215 Comments
5 By Roger C. SchonfeldNov 1, 202123 Comments
6 By Joseph EspositoOct 29, 20211 Comment
7 By Brigitte ShullOct 20, 20216 Comments
example.data <- c("By Leigh-Ann Butler, Shannon Cobb, Michael R. DonaldsonNov 9, 20213 Comments",
"By Leigh-Ann Butler, Shannon Cobb, Michael R. DonaldsonNov 8, 20212 Comments",
"By Rick AndersonNov 4, 202114 Comments",
"By Victoria Ficarra, Rob JohnsonNov 3, 20215 Comments")
CodePudding user response:
You could use
as.Date(gsub(". (\\w{3}\\s\\d{1,2},\\s\\d{4}).*", "\\1", example.data), format = "%b %d, %Y")
#> [1] "2021-11-09" "2021-11-08" "2021-11-04" "2021-11-03"
CodePudding user response:
strcapture(".*(\\D{3})\\s (\\d{1,2}),\\s (\\d{4}).*",
example.data, proto = list(mon="", day=0L, year=0L)) |>
transform(date = as.Date(paste(mon, day, year), format = "%b %d %Y"))
# mon day year date
# 1 Nov 9 2021 2021-11-09
# 2 Nov 8 2021 2021-11-08
# 3 Nov 4 2021 2021-11-04
# 4 Nov 3 2021 2021-11-03