Each answer or comment on a StackOverflow question thread has a unique URL. How can we use that URL with Invoke-WebRequest
(or other tool) to capture just the entire mini-Markdown contents of that answer or comment as the underlying mini-Markdown text, and from that, some useful information?
The reason I want this is that some answers contain scripts that I would like to automate the retrieval of into .ps1
files on various systems. example, given this URL https://superuser.com/questions/176624/linux-top-command-for-windows-powershell/1426271#1426271 , I would like to grab just the PowerShell code portion and pipe that into a file called mytop.ps1
.
CodePudding user response:
You may use StackExchange REST API to pull the question, in particular answers-by-id.
It still doesn't give you the markdown, but it will be easier to drill down to the answer's body using the JSON response instead of parsing the full page source. Actually I think that it outputs HTML for the answer body is even better than markdown, because you consistently get <code>
elements instead of having to parse all the different ways code can be formatted using markdown (e. g. code fences and indentation).
$answer = Invoke-RestMethod 'https://api.stackexchange.com/2.3/answers/1426271?site=superuser&filter=withbody'
$codes = [RegEx]::Matches( $answer.items.body, '(?s)<code>(.*?)</code>' ).ForEach{ $_.Groups[1].Value }
# This gives you the PowerShell script for this particular answer only!
$codes[6]
As there can be multiple <code>
elements, you might want to use heuristics to determine the one that contains the PowerShell script, e. g. sort by length and check if the code consists of multiple lines.