I have a Powershell
script which does many things, and ultimately writes a variable with an integer value to a text file. Below is a simplified example:
$theValue = 531231245
$theValue | Out-File .\Test.txt
I have also tried to add the ToString()
method:
$theValue = 531231245
$theValue.ToString() | Out-File .\Test.txt
It produces a text file, and when I double click on it, there are no surprises. I see theValue
in both cases in the text file, clearly as numerical values.
However, I then try to read it in python
and it produces a strange result
with open("Test.txt", 'r') as FID:
theText = FID.read()
print(theText)
Then the output is:
Output : ÿþ5 3 1 2 3 1 2 4 5
This is actually the least weird output, as I've received some strange strings that looked like bytes encoding. I tried decode
, readlines
and many other things.
I don't understand why I can't properly read the simple string from the text file. Any ideas?
CodePudding user response:
In Windows PowerShell, the
Out-File
cmdlet produces UTF-16LE ("Unicode") files by default, as does its effective alias,>
- PowerShell (Core) 7 , by contrast, fortunately now consistently defaults to BOM-less UTF-8.
Thus, you have two options:
Use
Out-File
's /Set-Content
's-Encoding
parameter to produce a file in the character encoding that Python recognizes by default.Use the
open()
function'sencoding
parameter to match the encoding produced by PowerShell; for Windows PowerShell:with open("t.txt", 'r', encoding='utf-16le') as FID: theText = FID.read() print(theText)
CodePudding user response:
ÿþ
is the Unicode 65279 character. You can remove unicode characters like this:
with open("Test.txt", 'r') as FID:
theText = FID.read()
string_encode = theText.encode("ascii", "ignore")
string_decode = string_encode.decode()
# output: 5 3 1 2 3 1 2 4 5
print(string_decode)