I made a simple backup batch file to compress the entire drive excluding some folders including a folder with a name in Hebrew. But the used command line results in Rar.exe
compressing also the folder with Hebrew name even when I use the short 8.3 folder name as output by dir /x
which is in this case CA05~1
.
That command line didn't work for excluding the folder with Hebrew name:
"C:\Program Files (x86)\WinRAR\Rar.exe" a -hp123 -r -x*\"12" -x*\"13" -x*\"backup" -x*\"CA05~1" -y -- "G:\backup\bu.rar" "G:\"
Are there any suggestions for a fix in a single command line?
This is just one example problem of many with Hebrew in command-line so renaming the folder would help, but just for the "short round".
Furthermore the provided code doesn't show all arguments like in "-x*\"
.
CodePudding user response:
The Windows command processor cmd.exe
processing a batch file is not designed for working with Unicode. It uses by default a character encoding with just one byte per character using an OEM code page according to the region (country) configured for the used account. A batch file should contain all characters according to the code page used by cmd.exe
on processing the batch file.
The OEM code page for Hebrew would be code page 856. A Hebrew folder name like אבג
Unicode encoded with UTF-16 Little Endian with the hexadecimal byte stream D0 05 D1 05 D2 05
would be encoded with code page 856 with the hexadecimal byte stream 80 81 82
.
I hope this is correct as Hebrew is displayed and read from right to left and not from left to right. So the first Hebrew letter alef is displayed as third character of the Hebrew word while the third Hebrew letter gimel is displayed as first character of the Hebrew word from a Western European point of view reading from left to right. I have no experience with Hebrew on my Windows computers.
It could work to have a batch file with:
@echo off
%SystemRoot%\System32\chcp.com 856
"%ProgramFiles(x86)%\WinRAR\Rar.exe" a -dh -ep1 -hp123 -idcdp -r "-x*\12\" "-x*\13\" "-x*\backup\" "-x*\אבג\" -y -- "G:\backup\bu.rar" "G:\"
The three Hebrew characters אבג
must be stored in the batch file with the hexadecimal byte stream 80 81 82
for first א
, second ב
and third ג
and displayed together as אבג
.
There is output on execution of the first line the following error message on a computer running Windows 7 with no support for Hebrew code page 856 installed at all:
Invalid code page
Therefore the code page is not switched from 850
(Western European OEM code page) to 856
(Hebrew OEM code page) as I can see on running the command chcp
after batch file processing finished and Rar.exe
included the folder with name אבג
.
Another solution could be using a UTF-8 encoded batch file without byte order mark (BOM) with the command lines:
@echo off
%SystemRoot%\System32\chcp.com 65001
"%ProgramFiles(x86)%\WinRAR\Rar.exe" a -dh -ep1 -hp123 -idcdp -r "-x*\12\" "-x*\13\" "-x*\backup\" "-x*\אבג\" -y -- "G:\backup\bu.rar" "G:\"
The three Hebrew characters displayed as אבג
must be stored in the batch file with the hexadecimal byte stream D7 90 D7 91 D7 92
. This worked on the computer with Windows 7.
My favorite solution would be using an ASCII encoded batch file with following command line:
@"%ProgramFiles(x86)%\WinRAR\Rar.exe" a -dh -ep1 -hp123 -idcdp -r -scul "-x@%~dp0ExcludeList.txt" -y -- "G:\backup\bu.rar" "G:\"
It has the advantage that it is easy to write and always works on any Windows including Windows XP.
This solution requires that there is in the directory of the batch file the text file ExcludeList.txt
. This text file must be Unicode encoded with UTF-16 with Little Endian with BOM whereby Rar.exe
supports also UTF-16 LE without BOM or UTF-16 BE with BOM. Windows Notepad saves a text file encoded with UTF-16 LE BOM on selecting Unicode
for the option Encoding in the Save As dialog window before clicking on button Save.
The Unicode encoded text file ExcludeList.txt
should contain the following lines:
*\12\
*\13\
*\backup\
*\אבג\
The switch "-x@%~dp0ExcludeList.txt"
informs Rar.exe
to read the arguments for exclusion from the file ExcludeList.txt
in the directory of the batch file. The switch -scul
tells Rar.exe
that this list file is Unicode (UTF-16) encoded. Rar.exe
is a full Unicode aware Windows console application.
Rar.exe
does not compare a file/folder name against the exclusion list items using the short 8.3 file/folder names. It uses always the long file/folder name as those names must be stored in the RAR archive file and not the short file/folder names in 8.3 format. There is also no guarantee that the file system creates short 8.3 file/folder names at all as this is an option on creating the file system (formatting a partition).
The manual of Rar.exe
is the text file Rar.txt
. It is stored in the program files folder of WinRAR like WinRAR.exe
, Rar.exe
and UnRAR.exe
. It can be opened with a double click and should be read once from top to bottom to understand the used switches.
The Rar
manual also describes that if a folder with a specific name like 12
, 13
, backup
or אבג
should be excluded independent on where such a folder is found in the folder hierarchy by using the wildcard character *
at the beginning, the folder name must be specified on Windows with a trailing \
and with a trailing /
on Linux/Max to be interpreted as folder name. The usage of the arguments "-x*\12" "-x*\13" "-x*\backup" "-x*\אבג"
respectively the lines with *\12
, *\13
, *\backup
and *\אבג
in the exclusion list file ExcludeList.txt
would result in excluding the files with such a name found anywhere in the folder hierarchy. The directory separator at the end on an exclusion item containing a wildcard character like *
or ?
is the indication for WinRAR to apply the wildcard pattern on folder names and not on file names.