Home > Back-end >  Different behavior of regexp() on linux and windows in matlab
Different behavior of regexp() on linux and windows in matlab

Time:06-27

I am encountering a different behavior of regexp() on linux and windows using MatLab. I am trying to separate a string based on a separator. Here is a minimal example:

Linux

test_string = '<some_path>/tool/test/unit_test'
seperator = sprintf('%stest%s',filesep,filesep)
regexp(test_string, seperator,'split')

Output:

1×2 cell array
{'<some_path>/tool'}    {'unit_test'}

Windows

test_string = '<some_path>\tool\test\unit_test'
seperator = sprintf('%stest%s',filesep,filesep)
regexp(test_string, seperator,'split')

Output

1×1 cell array
{'<some_path>\src\tool\test\unit_test'}

The output of this code snippet on Linux represents the behavior I want. Could anyone explain or point towards resources to understand what is going on?

CodePudding user response:

The path separators are different in Linux (/) and Windows (\).

The \ character is a special regex metacharacter, it is used to form "regex escapes", like \d to match digits, etc. To match a literal backslash, it must be doubled, or escaped.

To escape any special regex metacharacters, you can use regexptranslate(op, str) with op set to escape:

seperator = sprintf('%stest%s',regexptranslate('escape',filesep), regexptranslate('escape',filesep))

Other op possible values are:

Type of Translation Description
'escape' Translate all special characters in str, such as '$', '.', '?','[', so that they are treated as literal characters when used in regexp, regexpi, and regexprep. The translation inserts a backslash, or escape, character, '\', before each special character in str.
'wildcard' Translate all wildcard and '.' characters in str so that they are treated as literal wildcard characters and periods when used in regexp, regexpi, and regexprep. The translation replaces all instances of '*' with '.*', all instances of '?' with '.', and all instances of '.' with '\.'.
'flexible' Replace text in str with a regular expression that matches the text. If you specify 'flexible', then also specify a regular expression to use as a replacement: newStr = regexptranslate('flexible',str,expression). The expression input can be a character vector or string scalar.

This syntax is equivalent to newStr = regexprep(str,expression,regexptranslate('escape',expression)).
  • Related