substring-before the last occurrence of any non-word character xslt 2.0 3.0-CodePudding

Similar to this question, but in XSLT 2.0 or 3.0, and I want to break on the last non-word character (like the regex \W)

Finding the last occurrence of a string xslt 1.0

The input is REMOVE-THIS-IS-A-TEST-LINE,XXX,XXXXX

The desired output is:

REMOVE-THIS-IS-A-TEST-LINE,XXX,

This works for one delimiter at a time, but I need to break on at least commas, spaces and dashes.

substring('REMOVE-THIS-IS-A-TEST-LINE,XXX,XXXXX',1,index-of(string-to-codepoints('REMOVE-THIS-IS-A-TEST-LINE,XXX,XXXXX'),string-to-codepoints(' '))[last()])

I am using oxygen with saxon 9.9EE and antenna house.

CodePudding user response：

You could split out the string based on tokenize() and then use [] indexing or subsequence(), and then join() it back together, but the most concise approach is probably to do it all in the string realm using regex...

This XPath, based on , separators,

replace('REMOVE-THIS-IS-A-TEST-LINE,XXX,XXXXX', '(.*,). ', '$1')

or this XPath, based on the \W non-word-character,

replace('REMOVE-THIS-IS-A-TEST-LINE,XXX,XXXXX', '(.*\W). ', '$1')

will return

REMOVE-THIS-IS-A-TEST-LINE,XXX,

as requested.

CodePudding user response：

I would do

replace('REMOVE-THIS-IS-A-TEST-LINE,XXX,XXXXX', '(\W)\w*$', '$1')

However, this involves back-tracking, so it might be expensive if done on a long line. To avoid the backtracking, try

string-join(
   analyze-string(
      'REMOVE-THIS-IS-A-TEST-LINE,XXX,XXXXX', '\W')
      /*[not(position()=last())])