I am working with regex in R.
My string format is 'R- tiger', 'PK- Lion', 'Elephant'.
I want to extract only 'tiger' or 'Lion'.
df$animalName <- sub('^[A-Z]-','',df$animalName)
When I execute the above code, the First Letter of the Elephant as well gets removed. Whereas I just want to remove the initial abbrevation from 'tiger' and 'Lion'
CodePudding user response:
Try to use multiple match (with
) and also remove the space:
df$animalName <- sub('^[A-Z] - ', '', df$animalName)
Output:
> df$animalName
[1] "tiger" "Lion" "Elephant"
>
CodePudding user response:
If you just want to capture the last words (including no symbols) of the animal names, you may try:
df <- data.frame(animalName=c("R- tiger", "PK- Lion", "Elephant"), stringsAsFactors=FALSE)
df$animalName <- sub("^.*(?<!\\S)(\\w (?: \\w )*)$", "\\1", df$animalName, perl=TRUE)
df
animalName
1 tiger
2 Lion
3 Elephant
CodePudding user response:
Another regex approach using capture groups:
with(df, sub(".*\\s (\\w )", "\\1", animalName))