Home > Blockchain >  Remove words where the first character is letter and the rest numbers
Remove words where the first character is letter and the rest numbers

Time:01-11

I would like to remove all words from a string that starts with a letter followed by numbers and ending either with semicolon or a space. For example, given the string

x <- "Z1; D49;  Pay-What-You-Want; A1; Moods; Weather; Social norms, K20"

The desired output is

Pay-What-You-Want; Moods; Weather; Social norms;

Thank you

CodePudding user response:

So let's make it a "vector o strings" because it's easier to work with such a value than with a single character value.

# if commas should become semicolons then use gsub("
x <- gsub("[,]", ";", "Z1; D49;  Pay-What-You-Want; A1; Moods; Weather; Social norms, K20")
# make it a vector
x2 <- trimws(scan(text=x, what="", sep=";"))
#If you want it to be one string (which seems odd but doable:
(x3 <- paste( x2[!grepl("^[[:alpha:]](\\d) ",x2)] , collapse="; ") )
#[1] "Pay-What-You-Want; Moods; Weather; Social norms"
# Or
(x4 <- x2[!grepl("^[[:alpha:]](\\d) ",x2)] )
#[1] "Pay-What-You-Want" "Moods"             "Weather"           "Social norms"  

CodePudding user response:

My understanding of your comments is that you have a character vector, each element of which is a semicolon-delimited (with some commas) string. If that’s right, then using stringr functions within sapply():

library(stringr)

sapply(
  str_split(x, "(,|;)\\s "),
  \(.x) str_c(.x[!str_detect(.x, "^\\w\\d $")], collapse = "; ")
)
# [1] "Pay-What-You-Want; Moods; Weather; Social norms"

Or using base R:

sapply(
  strsplit(x, "(,|;)\\s "),
  \(.x) paste(.x[!grepl("^\\w\\d $", .x)], collapse = "; ")
)
  •  Tags:  
  • r
  • Related