Home > Blockchain >  Regex applying to way more text than intended
Regex applying to way more text than intended

Time:05-25

Here's an example of the text I'm trying to build a regex for (using 1.1.1.1 as an example for this post):

Nmap scan report for 1.1.1.1
kjerhtehrkererjh
kjhertkjherjtherjkhteter
kjehrjktherther
Nmap scan report for 1.1.1.1
Host is up (0.0011s latency).

PORT     STATE SERVICE
4786/tcp open  smart-install
| cisco-siet: 
|   Host: 1.1.1.1
|_  Status: VULNERABLE
MAC Address: XX:XX:XX:XX:XX:XX (Cisco Systems)

My intention is just to capture:

Nmap scan report for 1.1.1.1
Host is up (0.0011s latency).

PORT     STATE SERVICE
4786/tcp open  smart-install
| cisco-siet: 
|   Host: 1.1.1.1
|_  Status: VULNERABLE
MAC Address: XX:XX:XX:XX:XX:XX (Cisco Systems)

Currently, my regex looks like this: variable_containign_string.scan(/Nmap scan report.*?cisco-siet:.*?Status: VULNERABLE/m)

So here's my output:

irb(main):047:0> d.scan(/Nmap scan report.*?cisco-siet:.*?Status: VULNERABLE/m)
=> ["Nmap scan report for 1.1.1.1\nkjerhtehrkererjh\nkjhertkjherjtherjkhteter\nkjehrjktherther\nNmap scan report for 1.1.1.1\nHost is up (0.0011s latency).\n\nPORT     STATE SERVICE\n4786/tcp open  smart-install\n| cisco-siet: \n|   Host: 1.1.1.1\n|_  Status: VULNERABLE"]

While this does capture my intended target, it also captures the Nmap scan report that exists on top of my text, which is not the goal. The text that I'm trying to capture may appear inside of a lot of other text, so I would like to figure out a way to make sure that the captured text only consists of one instance of "Nmap scan report", but still capturing multiple "groups" of this text.

Hopefully this makes sense. Any help would be greatly appreciated.

Here's basically what I'm looking for:

enter image description here

CodePudding user response:

You can use

/Nmap scan report(?:(?!Nmap scan report).)*?cisco-siet:(?:(?!Nmap scan report).)*?Status: VULNERABLE/m

See the regex demo.

Details:

  • Nmap scan report - a fixed string (left-hand delimiter)
  • (?:(?!Nmap scan report).)*? - a pattern that roughly matches any text excluding the left-hand delimiter text
  • cisco-siet: - a fixed string
  • (?:(?!Nmap scan report).)*? - a pattern that roughly matches any text excluding the left-hand delimiter text
  • Status: VULNERABLE - a fixed string (right-hand delimiter).

Note that Onigmo regex engine requires m flag for the . pattern to match line break chars.

  • Related