Home > Enterprise >  Regex to match everything except a pattern
Regex to match everything except a pattern

Time:08-16

Regex noob here struggling with this, which I know it will be easy for some of you regex gods out there!

Given the following:

title:      Some title
date:       2022-08-15
tags:       <value to extract>
identifier: 1234567
---------------------------

Some text
some more text

I would like a regex to match everything except the value of tags (ie the "<value to extract>" text).

For context, this is supposed to run on emacs (in case it matters).

EDIT: Just to clarify as per @phils question, all I care about extracting the tags value. However, this is via a package setting that asks for a regex string and I don't have much control over how it gets use. It seems to expect a regex to strip what I don't need from the string rather than matching what I do want, which is slightly annoying.. Also, the since it seems to match everything with \\(.\\), I'm guessing it's using the global flag?

Please let me know if any of this isn't clear.

CodePudding user response:

Emacs regular expressions can't trivially express "not foo" for arbitrary values of foo. (The likes of PCRE have non-regular extensions for zero-width negative look-ahead/behind assertions, but in Emacs that sort of functionality is generally done with the support of lisp code1.)

You can still do it purely with regexp matching, but it's simply very cumbersome. An Emacs regexp which matches any line which does not begin with tags: is:

^\(?:$\|[^t]\|t[^a]\|ta[^g]\|tag[^s]\|tags[^:]\).*

or if you need to enter it in the elisp double-quoted read syntax for strings:

"^\\(?:$\\|[^t]\\|t[^a]\\|ta[^g]\\|tag[^s]\\|tags[^:]\\).*"


1 In lisp code you would instead simply check each line to see whether it does start with tags: and, if so, skip it (which is why Emacs generally gets away without the feature you're looking for, but of course that doesn't help you here).

CodePudding user response:

After playing around with it for a bit and taken inspiration from @phils' answer, I've come up with the following:

"^\\(?:\\(#\\ \\)?\\(?:filetags:\s \\|tags:\s \\|title:.*\\|identifier:.*\\|date:.*\\)\\|.*\\)"

I've also added an extra \\(#\\ \\)? to account for org meta keys which would usually have the format # key: value.

  • Related