This program uses Python's CSV module to process a stream containing a CR/LF delimited list of comma separated values (CSV). Instead of getting a list of strings, with each string representing the text that appears between the delimiters (the commas), I'm getting a list of characters. The program uses subprocess.run()
to return a stream containing rows of data separated by commas and newlines (CSV). The returned stream is printed and this output appears as expected (i.e. formatted as CSV). The program:
import os
import subprocess
import csv
for file in os.listdir("/Temp/Video"):
if file.endswith(".mkv"):
print(os.path.join("/Temp/Video", file))
ps = subprocess.run(["ffprobe", "-show_streams", "-print_format", "csv", "-i", "/Temp/Video/" file], capture_output = True, text = True)
print("------------------------------------------")
print(ps.stdout)
print("------------------------------------------")
reader = csv.reader(ps.stdout)
for row in reader:
print(row)
exit(0)
The output from the print(ps.stdout)
statement:
stream,0,h264,H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10,High,video,[0][0][0][0],0x0000,1920,1080,1920,1080,0,0,2,1:1,16:9,yuv420p,40,unknown,unknown,unknown,unknown,left,progressive,1,true,4,N/A,24000/1001,24000/1001,1/1000,0,0.000000,N/A,N/A,N/A,N/A,8,N/A,N/A,N/A,46,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,eng,17936509,01:20:18.271791666,115523,10802870592,001011,MakeMKV v1.16.4 win(x64-release),2021-08-20 19:09:26,BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES SOURCE_ID,Lavc59.7.102 libx264,00:01:30.010000000
stream,1,vorbis,Vorbis,unknown,audio,[0][0][0][0],0x0000,fltp,48000,3,3.0,0,0,N/A,0/0,0/0,1/1000,0,0.000000,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,3314,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,eng,Surround 3.0,2422660,01:20:18.272000000,451713,1459129736,001100,MakeMKV v1.16.4 win(x64-release),2021-08-20 19:09:26,BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES SOURCE_ID,Lavc59.7.102 libvorbis,00:01:30.003000000
And the some of the output from the for
loop:
['s']
['t']
['r']
['e']
['a']
['m']
['', '']
['0']
['', '']
['h']
['2']
['6']
['4']
['', '']
['H']
['.']
['2']
['6']
['4']
[' ']
['/']
[' ']
['A']
['V']
['C']
[' ']
['/']
[' ']
['M']
['P']
['E']
['G']
['-']
['4']
[' ']
What I was expecting was this:
[stream,0,h264,H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10. ...]
[stream,1,vorbis,Vorbis,unknown,audio,[0][0][0][0] ...]
Why is row
a list of characters and not a list of strings?
CodePudding user response:
It's returning bytes from stdout, not a file. When you loop over bytes, you get each instead of what you want. Instead, decode then split on newlines then loop over it.
lines = ps.stdout.decode().split('\n')
for line in lines:
cols = line.split(',')
print(cols[0]) # prints "stream"
This could be passed to csv reader. For example:
reader = ps.stdout.decode().splitlines():
for row in reader:
print(row)
You could also make a temp file from out subprocess stdout like so:
import csv
from io import StringIO
s = StringIO(ps.stdout.decode())
reader = csv.reader(s, skipinitialspace=True)
for row in reader:
print(row)