I'm struggling with regex, or in other words: I have no clue how to solve it...
I have several combinations of network elements and want to extract the first part of the name.
On the left side I have the source value and on the right the target I want to achieve:
|Source |Goal |
|:----------------|:----------|
|8000N01 V001 |8000N01 |
|6000N04_860 V001 |6000N04 |
|6999AP001 |6999AP001 |
|8000N01.2 V009 |8000N01.2 |
|8000N01.3 |8000N01.3 |
|8000N0613_86pian |8000N0613 |
|8852ANU146 V001 |8852ANU146 |
|8000Z001_plan |8000Z001 |
The left side starts always with 4 digits. But then the character can vary, also the digits after the character. If there is a point, I want to keep the digits after the point. If there is a space or an underline, it should be ignored.
Data:
library(data.table)
df = data.table(Source=c("8000N01 V001", "6000N04_860 V001", "6467RP001", "8000N01.2 V009", "8000N01.3", "8000N0613_86pian", "8852ANU146 V001", "8000Z001_plan"),
Goal=c("8000N01", "6000N04", "6467RP001", "8000N01.2", "8000N01.3", "8000N0613", "8852ANU146", "8000Z001"))
I'm happy for any help.
CodePudding user response:
You can use ^\\d{4}[A-Za-z] \\d (\\.\\d ){0,1}
take what was been asked for.
The left side starts always with 4 digits: ^\\d{4}
But then the character can vary: [A-Za-z]
also the digits after the character: \\d
If there is a point, I want to keep the digits after the point: (\\.\\d ){0,1}
df[, Goal_Check:=stringr::str_extract(Source, "^\\d{4}[A-Za-z] \\d (\\.\\d ){0,1}")]
df
df
# Source Goal Goal_Check
#1: 8000N01 V001 8000N01 8000N01
#2: 6000N04_860 V001 6000N04 6000N04
#3: 6467RP001 6467RP001 6467RP001
#4: 8000N01.2 V009 8000N01.2 8000N01.2
#5: 8000N01.3 8000N01.3 8000N01.3
#6: 8000N0613_86pian 8000N0613 8000N0613
#7: 8852ANU146 V001 8852ANU146 8852ANU146
#8: 8000Z001_plan 8000Z001 8000Z001
CodePudding user response:
The following regular expression will match what you are looking for:
^\d{4}[A-Z0-9.]*
matching:
- from the beginning of string
- 4 digits
- followed by any amount of characters (0 or more) if [A-Z0-9.]
CodePudding user response:
Probably we can try
df[, Goal := gsub("(^.*?)(\\s|_).*", "\\1", Source)]