I am trying to separate this string into a list using regex
:
-y -hwaccel cuda -threads 8 -loglevel error -hide_banner -stats -i - -c:v hevc_nvenc -rc constqp -preset p7 -qp 18 C:\Users\User\Documents\Python\Smoothie\test 124\Resampled_vid.mp4
I am using the following method to separate it:
split(r'(?!\\)' '\s ',f"{Settings[1]}".format(Input=InFile,Output=OutFile))
Output:
['-y', '-hwaccel', 'cuda', '-threads', '8', '-loglevel', 'error', '-hide_banner', '-stats', '-i', '-', '-c:v', 'hevc_nvenc', '-rc', 'constqp', '-preset', 'p7', '-qp', '18', 'C:\\Users\\User\\Documents\\Python\\Smoothie\\test', '124\\Resampled_vid.mp4']
Desired Output:
['-y', '-hwaccel', 'cuda', '-threads', '8', '-loglevel', 'error', '-hide_banner', '-stats', '-i', '-', '-c:v', 'hevc_nvenc', '-rc', 'constqp', '-preset', 'p7', '-qp', '18', 'C:\\Users\\User\\Documents\\Python\\Smoothie\\test 124\\Resampled_vid.mp4']
Is there anyway, I can exclusively avoid splitting at a file path?
CodePudding user response:
I would use an re.findall
approach here:
inp = "-y -hwaccel cuda -threads 8 -loglevel error -hide_banner -stats -i - -c:v hevc_nvenc -rc constqp -preset p7 -qp 18 C:\Users\User\Documents\Python\Smoothie\test 124\Resampled_vid.mp4"
parts = re.findall(r'[A-Z] :(?:\\[^\\] ) \.\w |\S ', inp)
print(parts)
['-y', '-hwaccel', 'cuda', '-threads', '8', '-loglevel', 'error', '-hide_banner',
'-stats', '-i', '-', '-c:v', 'hevc_nvenc', '-rc', 'constqp', '-preset', 'p7',
'-qp', '18',
'C:\\Users\\User\\Documents\\Python\\Smoothie\test 124\\Resampled_vid.mp4']
The regex pattern used here says to match, alternatively:
[A-Z] :(?:\\[^\\] ) \.\w a file path
| OR
\S any group of non whitespace characters
The trick here is to eagerly try to match a file path first. Only that failing do we try to match one word/term at a time.