I am encountering a different behavior of regexp() on linux and windows using MatLab. I am trying to separate a string based on a separator. Here is a minimal example:
Linux
test_string = '<some_path>/tool/test/unit_test'
seperator = sprintf('%stest%s',filesep,filesep)
regexp(test_string, seperator,'split')
Output:
1×2 cell array
{'<some_path>/tool'} {'unit_test'}
Windows
test_string = '<some_path>\tool\test\unit_test'
seperator = sprintf('%stest%s',filesep,filesep)
regexp(test_string, seperator,'split')
Output
1×1 cell array
{'<some_path>\src\tool\test\unit_test'}
The output of this code snippet on Linux represents the behavior I want. Could anyone explain or point towards resources to understand what is going on?
CodePudding user response:
The path separators are different in Linux (/
) and Windows (\
).
The \
character is a special regex metacharacter, it is used to form "regex escapes", like \d
to match digits, etc. To match a literal backslash, it must be doubled, or escaped.
To escape any special regex metacharacters, you can use regexptranslate(op, str)
with op
set to escape
:
seperator = sprintf('%stest%s',regexptranslate('escape',filesep), regexptranslate('escape',filesep))
Other op
possible values are:
Type of Translation | Description |
---|---|
'escape' |
Translate all special characters in str, such as '$' , '.' , '?' ,'[' , so that they are treated as literal characters when used in regexp , regexpi , and regexprep . The translation inserts a backslash, or escape, character, '\' , before each special character in str . |
'wildcard' |
Translate all wildcard and '.' characters in str so that they are treated as literal wildcard characters and periods when used in regexp , regexpi , and regexprep . The translation replaces all instances of '*' with '.*' , all instances of '?' with '.' , and all instances of '.' with '\.' . |
'flexible' |
Replace text in str with a regular expression that matches the text. If you specify 'flexible' , then also specify a regular expression to use as a replacement: newStr = regexptranslate('flexible',str,expression) . The expression input can be a character vector or string scalar. This syntax is equivalent to newStr = regexprep(str,expression,regexptranslate('escape',expression)) . |