If each document has a value that is similar to:
https://test.com/MODIF-RRS/D:/D-KGQLUL34TURWW-MODIF-AGENT04/_work/1179/s/test/code.cs and I want to remove the D:/D-KGQLUL34TURWW-MODIF-AGENT04/_work/1179/s/ part so I am left with https://test.com/MODIF-RRS/test/code.cs how would I do that?
I have a regex that works using an online tester
(D:/([a-zA-Z0-9_-] )/_work/([a-zA-Z0-9_-] )/s/)
but it gave me an error: invalid range: from (95) cannot be > to (93)
CodePudding user response:
I used char filter with your regex.
POST _analyze
{
"char_filter": {
"type":"pattern_replace",
"pattern":"(D:/([a-zA-Z0-9_-] )/_work/([a-zA-Z0-9_-] )/s/)"
},
"text": "https://test.com/MODIF-RRS/D:/D-KGQLUL34TURWW-MODIF-AGENT04/_work/1179/s/test/code.cs"
}
Token
{
"tokens": [
{
"token": "https://test.com/MODIF-RRS/test/code.cs",
"start_offset": 0,
"end_offset": 85,
"type": "word",
"position": 0
}
]
}
CodePudding user response:
(D:/([a-zA-Z0-9_-] )/_work/([a-zA-Z0-9_-] )/s/)
> invalid range: from (95) cannot be > to (93)
ASCII character 95 is _
and ASCII character 93 is ]
.
The parser thinks _-]
is supposed to be a range of characters (similar to A-Z
) and is confused because the ASCII values left and right of -
are not in ascending order.
As you do not want to specify a range there are all, try escaping the -
characters with a leading \
, so that the parser knows you mean a literal -
, not a range of characters:
(D:/([a-zA-Z0-9_\-] )/_work/([a-zA-Z0-9_\-] )/s/)
Note: Depending on how you specify your regex (in JSON?), you may have to escape the \
itself as well, so you'd have to write \\-
instead of \-
.
Alternatively it's usually possible to specify -
as first character in the set, then the parser realizes it cannot be a range.
(D:/([-a-zA-Z0-9_] )/_work/([-a-zA-Z0-9_] )/s/)