Home > Enterprise >  Use just regexp to split a string into a 'tuple' of filename and extension?
Use just regexp to split a string into a 'tuple' of filename and extension?

Time:11-25

I know there are easier ways to get file extensions with JavaScript, but partly to practice my regexp skills I wanted to try and use a regular expression to split a filename into two strings, before and after the final dot (. character).

Here's what I have so far

const myRegex = /^((?:[^.] (?:\.)*) ?)(\w )?$/
const [filename1, extension1] = 'foo.baz.bing.bong'.match(myRegex);
// filename1 = 'foo.baz.bing.'
// extension1 = 'bong'
const [filename, extension] = 'one.two'.match(myRegex);
// filename2 = 'one.'
// extension2 = 'two'
const [filename, extension] = 'noextension'.match(myRegex);
// filename2 = 'noextension'
// extension2 = ''

I've tried to use negative lookahead to say 'only match a literal . if it's followed by a word that ends in, like so, by changing (?:\.)* to (?:\.(?=\w .))*:

/^((?:[^.] (?:\.(?=(\w \.))))*)(\w )$/gm

But I want to exclude that final period using just the regexp, and preferably have 'noextension' be matched in the initial group, how can I do that with just regexp?

Here is my regexp scratch file: https://regex101.com/r/RTPRNU/1

CodePudding user response:

If you really want to use regex, I would suggest to use two regex:

// example with 'foo.baz.bing.bong'

const firstString = /^. (?=\.\w )./g // match 'foo.baz.bing.' 
const secondString = /\w $/g   // match 'bong'
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

CodePudding user response:

For the first capture group, you could start the match with 1 or more word characters. Then optionally repeat a . and again 1 or more word characters.

Then you can use an optional non capture group matching a . and capturing 1 or more word characters in group 2.

As the second non capture group is optional, the first repetition should be on greedy.

^(\w (?:\.\w )*?)(?:\.(\w ))?$

The pattern matches

  • ^ Start of string
  • ( Capture group 1
    • \w (?:\.\w )*? Match 1 word characters, and optionally repeat . and 1 word characters
  • ) Close group 1
  • (?: Non capture group to match as a whole
    • \.(\w ) Match a . and capture 1 word chars in capture group 2
  • )? Close non capture group and make it optional
  • $ End of string

Regex demo

  • Related