Regex in R to format string starting with capital letter-CodePudding

I am working with regex in R.

My string format is 'R- tiger', 'PK- Lion', 'Elephant'.

I want to extract only 'tiger' or 'Lion'.

df$animalName <- sub('^[A-Z]-','',df$animalName)

When I execute the above code, the First Letter of the Elephant as well gets removed. Whereas I just want to remove the initial abbrevation from 'tiger' and 'Lion'

CodePudding user response：

Try to use multiple match (with ) and also remove the space:

df$animalName <- sub('^[A-Z] - ', '', df$animalName)

Output:

> df$animalName
[1] "tiger"    "Lion"     "Elephant"
>

CodePudding user response：

If you just want to capture the last words (including no symbols) of the animal names, you may try:

df <- data.frame(animalName=c("R- tiger", "PK- Lion", "Elephant"), stringsAsFactors=FALSE)
df$animalName <- sub("^.*(?<!\\S)(\\w (?: \\w )*)$", "\\1", df$animalName, perl=TRUE)
df

  animalName
1      tiger
2       Lion
3   Elephant

CodePudding user response：

Another regex approach using capture groups:

 with(df, sub(".*\\s (\\w )", "\\1", animalName))