How do I read sys.stdin
, but ignoring decoding errors?
I know that sys.stdin.buffer
exists, and I can read the binary data and then decode it with .decode('utf8', errors='ignore')
, but I want to read sys.stdin
line by line.
Maybe I can somehow reopen the sys.stdin
file but with errors='ignore'
option?
CodePudding user response:
Found three solutions from here as Mark Setchell mentioned.
import sys
import io
def first():
with open(sys.stdin.fileno(), 'r', errors='ignore') as f:
return f.read()
def second():
sys.stdin = io.TextIOWrapper(sys.stdin.buffer, errors='ignore')
return sys.stdin.read()
def third():
sys.stdin.reconfigure(errors='ignore')
return sys.stdin.read()
print(first())
#print(second())
#print(third())
Usage:
$ echo 'a\x80b' | python solution.py
ab
CodePudding user response:
You can set an errorhandler option on the PYTHONIOENCODING environment variable: this will affect both sys.stdin
and sys,stdout
(sys.stderr
will always use "backslashreplace"). PYTHONIOENCODING
accepts an optional encoding name and an optional errorhandler name preceded by a colon, so "UTF8", "UTF8:ignore" and ":ignore" would all be valid values.
$ cat so73335410.py
import sys
if __name__ == '__main__':
data = sys.stdin.read()
print(data)
$
$ echo hello | python so73335410.py
hello
$ echo hello hello hello hello | zip > hello.zip
adding: - (deflated 54%)
$
$ cat hello.zip | PYTHONIOENCODING=UTF8:ignore python so73335410.py
UYv>
-▒
UY HW@'PKv>
▒-PK,-/>PKmPK/>
$