Little context: I'm coding some RegEx string substitution in C, using the POSIX regex library. (Thanks to @Jonathan Leffler for the clarification)
I've managed to complete the code for the actual substitution, but I'm struggling with the RegEx itself.
My aim is to find a word (namely, 'parameter') in a long string (it's actually a file read and stored as string). I know the word is on a line, but it could be preceeded or followed by other characters and I'd like to ignore all results that are preceeded by #
in any of the position before the first letter of the word
An example containing all the cases I've thought about:
# default parameter
parameter=3
someparameter=3
a parameter=3
###parameter=3
# ## parameter=3
## # parameter=3
# parameter=3
# parameter=3
something something # parameter=3
parameter=3
#something parameter=3
# something parameter=3
parameter=#3
parameter=
I've quickly found something for the standard cases: ^\s parameter=[0-9]
, then moved on to tune this to achieve what I want.
After much needed research, I've ended up with using negative lookbehinds, while also discovering that the caret ^
does not work as intended because the word is not at the start of the string (which, again, is the start of the file).
Now I'm currently at this:
(?<!\#)(?<!\#\s)(?<!\w)parameter=[0-9]
(?<!\#)
to ignore results preceeded by#
(?<!\#\s)
to ignore results preceeded by#
(?<!\w)
to ignore results preceeded by any other word character (as[a-zA-Z0-9_]
)
but as you can see here, it still matches some words preceeded by #
.
Is there any way to achieve what I've described?
A fine one would be to catch any leading whitespaces (if any), but only if there's not a #
before. But this is not a strict requirement for me, I can also leave them be.
EDIT: following @Blindy's suggestion, here's what I'd like to do
Examples of lines in which to match "parameter"
parameter=1
a parameter=12
parameter=345 # some comment
parameter=4
parameter=5 ####
Not to match
#parameter=0
# parameter=11
something # parameter=22
# something else parameter=3333
### something parameter=4312
# parameter=543
CodePudding user response:
If you are reading lines from a file (or standard input), the normal way to deal with comments is to delete them using strchr()
to find the start of the comment:
char *hash = strchr(line, '#');
if (hash != NULL)
*hash = '\0';
Or using strcspn()
:
line[strcspn(line, "#")] = '\0';`
You can zap the newline too (or instead) if you add it to the second argument to strcspn()
).
Then apply a simple regex with no look-behind needed to find the information you're interested in. Work on a copy of the line if need be.
If you wanted to, you could use a regex such as [[:space:]]*#
to find where to delete from; that strips any spaces before the #
too. It probably isn't necessary, though.
CodePudding user response:
When you give a list of example lines for a regex, you should say what you expect should happen for them, because what you have in your question is completely unusable.
So with that in mind, my guess of what you want would be simply something like this:
^[^#]*?\b(parameter=[0-9] )
Ie, any parameter expression that's not preceeded by #
. You can see it in action here: https://regex101.com/r/a21ram/1