I hope everyone is having a blast I have come to face this challange:
I want to be able to extract one portion of a string in the folliing manner:
- The string may or may not have a dot or may have plenty of them
- I want to extract the string part that is before the first dot, if there is no dot then I want the whole string
- I want to use a regex to achieve this
test<-c("This_This-This.Not This",
"This_This-This.not_.this",
"This_This-This",
"this",
"this.Not This")
since I need to use a regex, I have been trying to use this expression:
str_match(test,"(^[a-zA-Z]. )[\\.\\b]?")[,2]
but what I get is:
> str_match(test,"(^[a-zA-Z]. )[\\.\\b]?")[,2]
[1] "This_This-This.Not This" "This_This-This.not_this"
[3] "This_This-This" "this"
[5] "this.Not This"
>
My desired output is:
"This_This-This"
"This_This-This"
"This_This-This"
"this"
"this"
This is my thought process behind the regex
str_match(test,"(^[a-zA-Z]. )[\\.\\b]?")[,2]
(^[a-zA-Z]. )= this to capture the group before the dot since the string starts always with a letter cpas or lowers case, and all other strings after that thats why the .
[\.\b]?=a dot or a world boundary that may or may not be thats why the ?
Is not giving what I want and I will be so happy if yo guys can help me out to understand my miskte here thank you so much!!!
CodePudding user response:
Actually, rather than extracting, a regex replacement should work well here:
test <- c("This_This-This.Not This",
"This_This-This.not_.this",
"This_This-This",
"this",
"this.Not This")
output <- sub("\\..*", "", test)
output
[1] "This_This-This" "This_This-This" "This_This-This" "this"
[5] "this
Replacement works well here because it no-ops for any input not having any dots, in which case the original string is returned.
CodePudding user response:
My regex is "match anything up to either a dot or the end of the line".
library(stringr)
str_match(test, "^(.*?)(\\.|$)")[, 2]
Result:
[1] "This_This-This" "This_This-This" "This_This-This" "this" "this"