I want to clean strings that may contain garbage at the end, always separated by a forward slash / and if there is no garbage, there is no separator.
Example > expected output
Foo/Bar > Foo
Foobar > Foobar
I tried several versions like this one to extract the payload only, none of the worked:
(.*)\/.*
(.*)?\/.*
(.*)?\/*.*
And so on. Problem is: i always only get the first or second line to match.
What would be the correct expression to extract the wanted information?
CodePudding user response:
Your first and second pattern capture till before the first /
so that will not give a match for the third line as there is no /
present.
The third pattern matches the whole line as the /*
matches an optional forward slash, so the capture group will match the whole line, and the .*
will not match any characters any more as the capture group is already at the end of the line.
You could write the pattern with a capture group for 1 or more word characters as the first part, and an optional second part starting the match from /
till the end of the string.
In the replacement you can use the first capture group.
^(\w )(?:\/.*)?$
^
Start of string(\w )
Capture 1 word characters in group 1(?:\/.*)?
Optionally match/
and the rest of the line (to be removed after the replacement)$
End of string
See a regex demo.
There is no language listed, but an example using JavaScript:
const regex = /^(\w )(?:\/.*)?$/m;
const str = `Foo/Bar
Foobar`;
const result = str.replace(regex, "$1");
console.log(result);
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
Example using Python
import re
regex = r"^(\w )(?:\/.*)?$"
test_str = ("Foo/Bar\n"
"Foobar")
result = re.sub(regex, r'\1', test_str, 0, re.MULTILINE)
if result:
print (result)
Output
Foo
Foobar
CodePudding user response:
You can use replace
here as:
const cleanString = (str) => str.replace(/\/.*/, "");
console.log(cleanString("Foo/Bar"));
console.log(cleanString("Foobar"));
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
CodePudding user response:
You can try doing a regex.split on / and select the first element from the list. For example in python:
import regex as re
new_string = re.split('/',string)[0]
CodePudding user response:
This task doesn't need the power of regex, you need to split on the first slash, e.g. in Python:
test_string.split('/', 1)[0]
I think the reason your regex doesn't work is that Foobar
has no /
to match on. So for regex you need to handle none, one, or many slashes. Again, in Python:
>>> test = ['foobar', 'foo/bar', 'foo/bar/baz']
>>> for s in t:
print(re.findall('^(.*?)(?=/|$)', s))
['foobar']
['foo']
['foo']
The regex says: from the start of the string, group all characters (non-greedy) until either a slash or the end of the string.