Home > Software engineering >  Removing words between characters and blank lines in R
Removing words between characters and blank lines in R

Time:10-06

I've got a dataframe with a column full of cells that look like this:

"***ORDER LIST***\nCustomer: Lucille\nitem1: apples\nitem2: oranges"
"***ORDER LIST***\nCustomer: Frank and Sally\nitem1: wine\nitem2: milk"
"***ORDER LIST***\n\n\nitem1: wine\nitem2: milk"

I am trying to sanitize each cell, be removing the whole line beginning with the word Customer, or if it's not there, the first blank lines.

I would want to end up with sanitized text data like this:

"***ORDER LIST***\nitem1: apples\nitem2: oranges"
"***ORDER LIST***\nitem1: wine\nitem2: milk"
"***ORDER LIST***\nitem1: wine\nitem2: milk"

Using gsub is there a way to get rid of both blank lines, and the whole line containing the Customer?

Thanks

CodePudding user response:

Try something like:

text<-c("***ORDER LIST***\nCustomer: Lucille\nitem1: apples\nitem2: oranges",
        "***ORDER LIST***\nCustomer: Frank and Sally\nitem1: wine\nitem2: milk",
        "***ORDER LIST***\n\n\nitem1: wine\nitem2: milk")


gsub("Customer: .*?\\n|\\n\\n", " ", text)


[1] "***ORDER LIST***\n item1: apples\nitem2: oranges" "***ORDER LIST***\n item1: wine\nitem2: milk"     
[3] "***ORDER LIST*** \nitem1: wine\nitem2: milk"     

CodePudding user response:

Does this work for you?

gsub("(.*\\*).*?(\nitem.*)", "\\1\\2", text)
[1] "***ORDER LIST***\nitem1: apples\nitem2: oranges" "***ORDER LIST***\nitem1: wine\nitem2: milk"     
[3] "***ORDER LIST***\nitem1: wine\nitem2: milk"
  • Related