I have a question similar to this one but instead of having two specific characters to look between, I want to get the text between a space and a specific character. In my example, I have this string:
myString <- "This is my string I scraped from the web. I want to remove all instances of a picture. picture-file.jpg. The text continues here. picture-file2.jpg"
but if I were to do something like this: str_remove_all(myString, " .*jpg)
I end up with
[1] "This"
I know that what's happening is R is finding the first instance of a space and removing everything between that space and ".jpg" but I want it to be the first space immediately before ".jpg". My final result I hope for looks like this:
[1] "This is my string I scraped from the web. I want to remove all instances of a picture. the text continues here.
NOTE: I know that a solution may arise which does what I want, but ends up putting two periods next to each other. I do not mind a solution like that because later in my analysis I am removing punctuation.
CodePudding user response:
You can use
str_remove_all(myString, "\\S*\\.jpg")
Or, if you also want to remove optional whitespace before the "word":
str_remove_all(myString, "\\s*\\S*\\.jpg")
Details:
\s*
- zero or more whitespaces\S*
- zero or more non-whitespaces\.jpg
-.jpg
substring.
To make it case insensitive, add (?i)
at the pattern part: "(?i)\\s*\\S*\\.jpg"
.
If you need to make sure there is no word char after jpg
, add a word boundary: "(?i)\\s*\\S*\\.jpg\\b"