Home > Software engineering >  Regex for several combinations of network element names
Regex for several combinations of network element names

Time:05-02

I'm struggling with regex, or in other words: I have no clue how to solve it...

I have several combinations of network elements and want to extract the first part of the name.

On the left side I have the source value and on the right the target I want to achieve:

|Source           |Goal       |
|:----------------|:----------|
|8000N01 V001     |8000N01    |
|6000N04_860 V001 |6000N04    |
|6999AP001        |6999AP001  |
|8000N01.2 V009   |8000N01.2  |
|8000N01.3        |8000N01.3  |
|8000N0613_86pian |8000N0613  |
|8852ANU146 V001  |8852ANU146 |
|8000Z001_plan    |8000Z001   |

The left side starts always with 4 digits. But then the character can vary, also the digits after the character. If there is a point, I want to keep the digits after the point. If there is a space or an underline, it should be ignored.

Data:

library(data.table)

df = data.table(Source=c("8000N01 V001", "6000N04_860 V001", "6467RP001", "8000N01.2 V009", "8000N01.3", "8000N0613_86pian", "8852ANU146 V001", "8000Z001_plan"),
                Goal=c("8000N01", "6000N04", "6467RP001", "8000N01.2", "8000N01.3", "8000N0613", "8852ANU146", "8000Z001"))

I'm happy for any help.

CodePudding user response:

You can use ^\\d{4}[A-Za-z] \\d (\\.\\d ){0,1} take what was been asked for.

The left side starts always with 4 digits: ^\\d{4}

But then the character can vary: [A-Za-z]

also the digits after the character: \\d

If there is a point, I want to keep the digits after the point: (\\.\\d ){0,1}

df[, Goal_Check:=stringr::str_extract(Source, "^\\d{4}[A-Za-z] \\d (\\.\\d ){0,1}")]
df
df
#             Source       Goal Goal_Check
#1:     8000N01 V001    8000N01    8000N01
#2: 6000N04_860 V001    6000N04    6000N04
#3:        6467RP001  6467RP001  6467RP001
#4:   8000N01.2 V009  8000N01.2  8000N01.2
#5:        8000N01.3  8000N01.3  8000N01.3
#6: 8000N0613_86pian  8000N0613  8000N0613
#7:  8852ANU146 V001 8852ANU146 8852ANU146
#8:    8000Z001_plan   8000Z001   8000Z001

CodePudding user response:

The following regular expression will match what you are looking for:

^\d{4}[A-Z0-9.]*

matching:

  • from the beginning of string
  • 4 digits
  • followed by any amount of characters (0 or more) if [A-Z0-9.]

CodePudding user response:

Probably we can try

df[, Goal := gsub("(^.*?)(\\s|_).*", "\\1", Source)]
  • Related