Home > front end >  Reading text file in python which was the output of a Powershell script produces unexpected results?
Reading text file in python which was the output of a Powershell script produces unexpected results?

Time:05-05

I have a Powershell script which does many things, and ultimately writes a variable with an integer value to a text file. Below is a simplified example:

$theValue = 531231245
$theValue | Out-File .\Test.txt

I have also tried to add the ToString() method:

$theValue = 531231245
$theValue.ToString() | Out-File .\Test.txt

It produces a text file, and when I double click on it, there are no surprises. I see theValue in both cases in the text file, clearly as numerical values.

However, I then try to read it in python and it produces a strange result

with open("Test.txt", 'r') as FID: 
    theText = FID.read()
print(theText)

Then the output is:

Output : ÿþ5 3 1 2 3 1 2 4 5

This is actually the least weird output, as I've received some strange strings that looked like bytes encoding. I tried decode, readlines and many other things.

I don't understand why I can't properly read the simple string from the text file. Any ideas?

CodePudding user response:

  • In Windows PowerShell, the Out-File cmdlet produces UTF-16LE ("Unicode") files by default, as does its effective alias, >

    • PowerShell (Core) 7 , by contrast, fortunately now consistently defaults to BOM-less UTF-8.
  • Thus, you have two options:

    • Use Out-File's / Set-Content's -Encoding parameter to produce a file in the character encoding that Python recognizes by default.

    • Use the open() function's encoding parameter to match the encoding produced by PowerShell; for Windows PowerShell:

      with open("t.txt", 'r', encoding='utf-16le') as FID: 
        theText = FID.read()
      print(theText)
      

CodePudding user response:

ÿþ is the Unicode 65279 character. You can remove unicode characters like this:

with open("Test.txt", 'r') as FID: 
    theText = FID.read()
    string_encode = theText.encode("ascii", "ignore")
    string_decode = string_encode.decode()

    # output: 5 3 1 2 3 1 2 4 5
    print(string_decode)
  • Related