I have the following data (a subset of possible log4j responders if someone is interested)
ap://167.172.44.255:1389/LegitimateJavaCla
ap://167.172.44.255:1389/La
ap://167.99.32.139:1389/Basic/ReverseShell/167.99.32.139/99
ldap://x.x.x.x.61k2ev3252274o2ek77941q85t0r9444o.interact.sh/ok6ll9m
ldap://c6ps4rekeidcvgqlsmsgcg37qdoyyknz4.interact.sh/a
ldap://c6ps4rekeidcvgqlsmsgcg37x9ayymcak.interact.sh/a
ldap://c6ps4ipurnhssm2608l0cg37chyyykyhk.interact.sh/a
ldap://c6ps4ipurnhssm2608l0cg37pdyyykbug.interact.sh/a
91fd9fef8958.bingsearchlib.com:39356/
550f7e1deaed.bingsearchlib.com:39356/a
2174d47e8d04.bingsearchlib.com:39356/a
da6d408517b9.bingsearchlib.com:39356/a
5463610592ef.bingsearchlib.com:39356/a
I would like to keep the FQDN only (the host and domain) or the IP - so I tried (\S*)?(:\/\/)?(?<interesting>.*)(:)?\/
(see https://regex101.com/r/dusRR5/1)
The idea was:
(\S*)?
→ match or not some letters (ldap
, ...)(:\/\/)?
→ match or not://
(?<interesting>.*)
→ match anything and call itinteresting
(:)?
→ ... but stop at:
if there is one\/
→ ... otherwise stop at/
The expected result is
167.172.44.255
167.99.32.139
x.x.x.x.61k2ev3252274o2ek77941q85t0r9444o.interact.sh
c6ps4rekeidcvgqlsmsgcg37qdoyyknz4.interact.sh
c6ps4rekeidcvgqlsmsgcg37x9ayymcak.interact.sh
(...)
But it does not work and my very limited knowledge of regex does not help.
CodePudding user response:
Modified a bit:
^((?:\S*:\/\/)?\S*?)[:\/]
The capturing group contains what you are interested in. The key is to use the lazy approach (*?
) along with the start line anchor (^
).
CodePudding user response:
You can use
^(?:[a-zA-Z0-9] :\/\/)?(?<interesting>[^:\/] )
See the regex demo. Details:
^
- start of string(?:[a-zA-Z0-9] :\/\/)?
- an optional occurrence of any one or more letters/digits and then://
(?<interesting>[^:\/] )
- Group "interesting": any one or more chars other than:
and/
.
Remember that you do not have to escape /
if you define your regex with a string literal (as in Python, or C#, or using constructor notations in JavaScript/Ruby/etc.).