Consider strings with this format:
id-string1-string2-string3.extension
where id, string1, string2 and string3 can be string of variable length, and extension is an image extension type.
For example, two possible strings could be:
Il2dK-Ud2d9-Kod2d-d9dwo.jpg
j54fwf3da-7jrg-9eujodww-kio98ujk.png
I need tokenizer method in JavaScript for an express/nodejs API that takes these strings in input and outputs an object with this format:
{a: id-string1-string2, b: string3, c: extension}
For the example strings this tokenizer should then output:
{a: Il2dK-Ud2d9-Kod2d, b: d9dwo, c: jpg}
{a: j54fwf3da-7jrg-9eujodww, b: kio98ujk, c: png}
I think this can be done with regex. I tried to use the following regex match(/[^-] /g), but this tokenize every substring, I need a way to skip the first 2 char "-" but couldn't find it out.
Do you have any ideas? Or could you provide me a better solution instead of using regex? Thanks very much!
CodePudding user response:
You can achieve this using spit
as:
const str = 'Il2dK-Ud2d9-Kod2d-d9dwo.jpg';
const [restStr, c] = str.split('.');
const [a, b] = restStr.split(/-([a-z0-9] $)/);
const result = { a, b, c };
console.log(result);
CodePudding user response:
You might use a pattern with capture groups:
^(?<a>[^\s-] (?:-[^\s-] )*)-(?<b>[^\s.-] )\.(?<c>\w )$
Explanation
^
Start of string(?<a>[^\s-] (?:-[^\s-] )*)
Named group a, match any char except a whitespace char or-
and optionally repeat-
and again any char except a whitespace char-
Match literally(?<b>[^\s.-] )
Named group b, match 1 chars other than.
-
or a whitespace char\.
Match.
(?<c>\w )
Named group c, match 1 word chars for the extension$
End of string
const regex = /^(?<a>[^\s-] (?:-[^\s-] )*)-(?<b>[^\s.-] )\.(?<c>\w )$/;
[
"id-string1-string2-string3.extension",
"Il2dK-Ud2d9-Kod2d-d9dwo.jpg",
"j54fwf3da-7jrg-9eujodww-kio98ujk.png",
"a-b-c",
"a.b"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(m.groups);
}
});
Without named groups, you can use capture groups and create the objects:
const regex = /^([^\s-] (?:-[^\s-] )*)-([^\s.-] )\.(\w )$/;
[
"id-string1-string2-string3.extension",
"Il2dK-Ud2d9-Kod2d-d9dwo.jpg",
"j54fwf3da-7jrg-9eujodww-kio98ujk.png",
"a-b-c",
"a.b"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log({
"a": m[1],
"b": m[2],
"c": m[3]
});
}
});
CodePudding user response:
To split at the last hyphen or any period:
res = str.split(/-(?![^-]*-)|\./);
See this demo at regex101 or JS demo at tio.run
(?!
negative lookahead)
[^
negated character set]
|
OR match any period\.
At the position after any hyphen a negative lookahead (zero-length assertion/condtion) checks if there is not another hyphen ahead with any amount of non-hyphens in between OR match period.