Home > Enterprise >  extracting a specific word from a column in R
extracting a specific word from a column in R

Time:04-20

Hey I'm new with R and working on a small project in Rstudio and I need some help. I have data that looks similiar to the following

x=training 1 - Monday- 12h30-15h00

Saturday 16h-20h

Training 2 - Friday-06h-08h0

training 1 - Tuesday - 13h30-15h00

Sunday 16h-20h

Training 3 - Thursday-9h00-10h00

x is a column from a dataframe.

My question is how do I extract specific word like (Sunday, Monday, Tuesday etc...

It should be like:

if x contains Saturday then that row should show Saturday in the New_column

if x contains Sunday then that row should show Sunday in the New_column

if x contains Tuesday then that row should show Tuesday in the New_column

I created a string that contains all weekdays

weekdays <- paste0(weekdays(seq(Sys.Date(), by =1,length = 7)), collapse = "|")

Suggestion 1:

In the following I try extracting weekdays from the column My_Data$Traininghour

My_Data$JOUR<- sub(sprintf('.*(%s).*', weekdays), '\\1',My_Data$Traininghour )

It gives My_Data$JOUR column the exactly same info that is found ind the column My_Data$Traininghour.

Suggestion 2

My_Data$JOUR<-regmatches(My_Data$Traininghour, regexpr (weekdays, My_Data$Traininghour))

Suggestion 2 gives following error:

Assigned data `regmatches(My_Data$Traininghour, regexpr (weekdays, My_Data$Traininghour))` must be compatible with existing data.

x Existing data has 4903 rows.
x Assigned data has 0 rows.
i Only vectors of size 1 are recycled.
Run `rlang::last_error()` to see where the error occurred.

Suggestion 3

My_Data$JOUR <-stringr::str_extract(My_Data$Traininghour, weekdays)

Suggestion 3 return NA in every row in the column My_Data$JOUR

I'm not sure what I'm doing wrong

CodePudding user response:

Create a string that contains all the weekdays to use as the regex pattern.

weekdays <- paste0(weekdays(seq(Sys.Date(), by =1,length = 7)), collapse = "|")

In base R we can extract the weekdays from the x vector as follows:

sub(sprintf('.*(%s).*', weekdays), '\\1', x)
[1] "Monday"   "Saturday" "Friday"   "Tuesday"  "Sunday"   "Thursday"

or even

regmatches(x, regexpr(weekdays, x))
[1] "Monday"   "Saturday" "Friday"   "Tuesday"  "Sunday"   "Thursday"

It is simpler to use stringr package as below:

stringr::str_extract(x, weekdays)
[1] "Monday"   "Saturday" "Friday"   "Tuesday"  "Sunday"   "Thursday"
  •  Tags:  
  • r
  • Related