Home > Software engineering >  Regex to filter two conditions: one after another one
Regex to filter two conditions: one after another one

Time:10-27

Say I have string example_photo_name1.png and example_photo_name2.png. I want to remove everything after first . and everything before last _. The expected output is name1 and name2.

I could remove everything after first . by using pattern (.*)\.[^\.]*$. However, I do not know how to remove everything before the last _. How can I do this?

CodePudding user response:

You could use pattern (?<=_)(?!. _). (?=\.)

Pattern explanation:

(?<=_) - positive lookbehind - assert what preceeds is underscore

(?!. _) - negative lookahead - assert what follows does not contain any underscore (so we are sure we are just behind last underscore)

. - match one or more of any characters

(?=\.) - assert what follow is dot .

Regex demo

Matched text will be exactly what you want.

CodePudding user response:

Something like this might work:

const string = 'example_photo_name1.png';
string.replace(/^.*_(.*?)\.[^.]*$/, '$1');

If you want the prefix xxx_ and the suffix .xxx to be optional then you can wrap them in non capturing groups and add the proper quantifier:

/^(?:.*_)?(.*?)(?:\.[^.]*)?$/

This way string like:

hello_world => world
world.jpg   => world

CodePudding user response:

The (.*)\.[^\.]*$ pattern of yours (used with .replace and $1) removes the last . and the rest of the string.

You can use

text = text.replace(/^.*_|\..*/g, '')

See the regex demo. Details:

  • ^.*_ - start of a string, any zero or more chars other than line break chars as many as possible, and then a _ char
  • | - or
  • \..* - a dot and then any zero or more chars other than line break chars as many as possible

See a JavaScript demo:

const texts = ['example_photo_name1.png', 'example_photo_name2.png'];
const re = /^.*_|\..*/g;
for (const text of texts) {
  console.log(text, '=>', text.replace(re, ''))
}
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

CodePudding user response:

You can first match until the last occurrence of _, then capture in group 1 all chars other than _

Then match till the last dot, and after that till the end of the string excluding matching other dots or underscores.

In the replacement, use capture group 1, denoted as $1

^.*_([^_] )\.[^_.] $

The pattern matches:

  • ^ Start of string
  • .*_ Match until the end of the string, and backtrack till the last occurrence of _
  • ([^_] ) Capture group 1, match 1 occurrences of any char other than _
  • \.[^_.] Match . and 1 occurrences other than . or _
  • $ End of string

Regex demo

const regex = /^.*_([^_] )\.[^_.] $/;
[
  "example_photo_name1.png",
  "example_photo_name2.png"
].forEach(s => console.log(s.replace(regex, "$1")));
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

To prevent crossing newlines, you can add \n to the negated character class.

  • Related