Hey I'm new with R and working on a small project in Rstudio and I need some help. I have data that looks similiar to the following
x=training 1 - Monday- 12h30-15h00
Saturday 16h-20h
Training 2 - Friday-06h-08h0
training 1 - Tuesday - 13h30-15h00
Sunday 16h-20h
Training 3 - Thursday-9h00-10h00
x is a column from a dataframe.
My question is how do I extract specific word like (Sunday, Monday, Tuesday etc...
It should be like:
if x contains Saturday then that row should show Saturday in the New_column
if x contains Sunday then that row should show Sunday in the New_column
if x contains Tuesday then that row should show Tuesday in the New_column
I created a string that contains all weekdays
weekdays <- paste0(weekdays(seq(Sys.Date(), by =1,length = 7)), collapse = "|")
Suggestion 1:
In the following I try extracting weekdays from the column My_Data$Traininghour
My_Data$JOUR<- sub(sprintf('.*(%s).*', weekdays), '\\1',My_Data$Traininghour )
It gives My_Data$JOUR column the exactly same info that is found ind the column My_Data$Traininghour.
Suggestion 2
My_Data$JOUR<-regmatches(My_Data$Traininghour, regexpr (weekdays, My_Data$Traininghour))
Suggestion 2 gives following error:
Assigned data `regmatches(My_Data$Traininghour, regexpr (weekdays, My_Data$Traininghour))` must be compatible with existing data.
x Existing data has 4903 rows.
x Assigned data has 0 rows.
i Only vectors of size 1 are recycled.
Run `rlang::last_error()` to see where the error occurred.
Suggestion 3
My_Data$JOUR <-stringr::str_extract(My_Data$Traininghour, weekdays)
Suggestion 3 return NA in every row in the column My_Data$JOUR
I'm not sure what I'm doing wrong
CodePudding user response:
Create a string that contains all the weekdays to use as the regex pattern.
weekdays <- paste0(weekdays(seq(Sys.Date(), by =1,length = 7)), collapse = "|")
In base R we can extract the weekdays from the x
vector as follows:
sub(sprintf('.*(%s).*', weekdays), '\\1', x)
[1] "Monday" "Saturday" "Friday" "Tuesday" "Sunday" "Thursday"
or even
regmatches(x, regexpr(weekdays, x))
[1] "Monday" "Saturday" "Friday" "Tuesday" "Sunday" "Thursday"
It is simpler to use stringr
package as below:
stringr::str_extract(x, weekdays)
[1] "Monday" "Saturday" "Friday" "Tuesday" "Sunday" "Thursday"