I am hoping someone can help me with this issue as I am lost. I am calling a Powershell script that produces several lines of output, this in an extract:
7-Zip 22.01 (x64) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15
Scanning the drive:
7 folders, 21 files, 21544 bytes (22 KiB)
Creating archive: conf.tar
Creating archive: conf2.tar
Removing tar file after upload...
Generating Links:
--------------------------------------------------------------
Link_1
https://some-repository.s3.ap-northeast-2.amazonaws.com/test/conf.tar?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXXXXXXXXXXXXXXXX..
--------------------------------------------------------------
Link_2
https://some-repository.s3.ap-northeast-2.amazonaws.com/test/conf2.tar?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXXXXXXXXXXXXXXXX..
My Python script calls the Powershell script this way:
import subprocess, sys
p = subprocess.Popen(["powershell.exe",
"script.ps1"],
stdout=sys.stdout, shell=True)
p_out, p_err = p.communicate()
print(p_out)
And I can see the output on screen when I run the python script from a Powershell CLI. Is there a way to extract those links from the output and pass them to Python?
CodePudding user response:
You should have all in p_out
as string (so you should already have it in Python) and now you should use Python's functions to extract it from this string. You can split to lines and search line with https
at the beginning. OR you can use regex.
p_out = '''7-Zip 22.01 (x64) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15
Scanning the drive:
7 folders, 21 files, 21544 bytes (22 KiB)
Creating archive: conf.tar
Creating archive: conf2.tar
Removing tar file after upload...
Generating Links:
--------------------------------------------------------------
Link_1
https://some-repository.s3.ap-northeast-2.amazonaws.com/test/conf.tar?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXXXXXXXXXXXXXXXX..
--------------------------------------------------------------
Link_2
https://some-repository.s3.ap-northeast-2.amazonaws.com/test/conf2.tar?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXXXXXXXXXXXXXXXX..'''
lines = p_out.split('\n')
links = []
for line in lines:
if line.startswith('http'):
line = line.strip() # remove '\n' and spaces
links.append(line)
for url in links:
print('url:', url)
Result:
url: https://some-repository.s3.ap-northeast-2.amazonaws.com/test/conf.tar?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXXXXXXXXXXXXXXXX..
url: https://some-repository.s3.ap-northeast-2.amazonaws.com/test/conf2.tar?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXXXXXXXXXXXXXXXX..
And if you don't have it in p_out
then you should check if p_err
.
CodePudding user response:
In order to capture stdout and stderr output, you must replace
stdout=sys.stdout
withstdout=PIPE, stderr=PIPE
.- By contrast,
stdout=sys.stdout
passes output from the PowerShell call directly through to the console (terminal), sop_out
andp_err
ended up asNone
.
- By contrast,
There is no need for
shell=True
(calling via the platform's default shell) in your case - it only slows things down.Adding
universal_newlines=True
makes Python automatically report the collected stdout and stderr output as strings.While you could extract the lines of interest in Python code afterwards, a small addition to your PowerShell call allows you to do that at the source.
Therefore:
from asyncio.subprocess import PIPE
import subprocess
p = subprocess.Popen(
['powershell', '-NoProfile', '-Command', "(./script.ps1) -match '^https://'" ],
stdout=PIPE, stderr=PIPE, universal_newlines=True
)
# Wait for the process to terminate and collect its stdout and stderr output.
p_out, p_err = p.communicate()
# Split the single multi-line string that contains the links
# into individual lines.
lines = p_out.splitlines()
print(lines)
Note:
PowerShell CLI parameters used:
-NoProfile
isn't strictly necessary, but advisable, because it suppresses loading of PowerShell's profiles, which can both help performance and makes for a predictable execution environment.-Command
isn't strictly necessary withpowershell.exe
, the Windows PowerShell CLI, as it is the implied default; however, it is necessary if you call the PowerShell (Core) 7 CLI,pwsh.exe
, which now defaults to-File
instead.
The PowerShell code used to extract the links:
- Since your script invokes an external program,
7z.exe
, that program's stdout is reported line by line by PowerShell. - When the regex-based
-match
operator is given an array as its LHS operand, it acts as a filter. Therefore, only those lines that start with (^
) stringhttps://
are returned.
- Since your script invokes an external program,