Home > Blockchain >  How to remove characters between space and specific character in R
How to remove characters between space and specific character in R

Time:05-26

I have a question similar to this one but instead of having two specific characters to look between, I want to get the text between a space and a specific character. In my example, I have this string:

myString <- "This is my string I scraped from the web. I want to remove all instances of a picture. picture-file.jpg. The text continues here. picture-file2.jpg"

but if I were to do something like this: str_remove_all(myString, " .*jpg) I end up with

[1] "This"

I know that what's happening is R is finding the first instance of a space and removing everything between that space and ".jpg" but I want it to be the first space immediately before ".jpg". My final result I hope for looks like this:

[1] "This is my string I scraped from the web. I want to remove all instances of a picture. the text continues here.

NOTE: I know that a solution may arise which does what I want, but ends up putting two periods next to each other. I do not mind a solution like that because later in my analysis I am removing punctuation.

CodePudding user response:

You can use

str_remove_all(myString, "\\S*\\.jpg")

Or, if you also want to remove optional whitespace before the "word":

str_remove_all(myString, "\\s*\\S*\\.jpg")

Details:

  • \s* - zero or more whitespaces
  • \S* - zero or more non-whitespaces
  • \.jpg - .jpg substring.

To make it case insensitive, add (?i) at the pattern part: "(?i)\\s*\\S*\\.jpg".

If you need to make sure there is no word char after jpg, add a word boundary: "(?i)\\s*\\S*\\.jpg\\b"

  • Related