Home > Net >  Regex to remove trailing optional garbage
Regex to remove trailing optional garbage

Time:11-17

I want to clean strings that may contain garbage at the end, always separated by a forward slash / and if there is no garbage, there is no separator.

Example > expected output

Foo/Bar > Foo 
Foobar > Foobar

I tried several versions like this one to extract the payload only, none of the worked:

(.*)\/.*

(.*)?\/.*

(.*)?\/*.*

And so on. Problem is: i always only get the first or second line to match.

What would be the correct expression to extract the wanted information?

CodePudding user response:

Your first and second pattern capture till before the first / so that will not give a match for the third line as there is no / present.

The third pattern matches the whole line as the /* matches an optional forward slash, so the capture group will match the whole line, and the .* will not match any characters any more as the capture group is already at the end of the line.

You could write the pattern with a capture group for 1 or more word characters as the first part, and an optional second part starting the match from / till the end of the string.

In the replacement you can use the first capture group.

^(\w )(?:\/.*)?$
  • ^ Start of string
  • (\w ) Capture 1 word characters in group 1
  • (?:\/.*)? Optionally match / and the rest of the line (to be removed after the replacement)
  • $ End of string

See a regex demo.

There is no language listed, but an example using JavaScript:

const regex = /^(\w )(?:\/.*)?$/m;
const str = `Foo/Bar
Foobar`;
const result = str.replace(regex, "$1");
console.log(result);
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>


Example using Python

import re

regex = r"^(\w )(?:\/.*)?$"
test_str = ("Foo/Bar\n"
    "Foobar")
result = re.sub(regex, r'\1', test_str, 0, re.MULTILINE)

if result:
    print (result)

Output

Foo
Foobar

Python demo

CodePudding user response:

You can use replace here as:

const cleanString = (str) => str.replace(/\/.*/, "");

console.log(cleanString("Foo/Bar"));
console.log(cleanString("Foobar"));
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

CodePudding user response:

You can try doing a regex.split on / and select the first element from the list. For example in python:

import regex as re

new_string = re.split('/',string)[0]

CodePudding user response:

This task doesn't need the power of regex, you need to split on the first slash, e.g. in Python:

test_string.split('/', 1)[0]

I think the reason your regex doesn't work is that Foobar has no / to match on. So for regex you need to handle none, one, or many slashes. Again, in Python:

>>> test = ['foobar', 'foo/bar', 'foo/bar/baz']
>>> for s in t:
    print(re.findall('^(.*?)(?=/|$)', s))
        
['foobar']
['foo']
['foo']

The regex says: from the start of the string, group all characters (non-greedy) until either a slash or the end of the string.

  • Related