I am trying to parse a server monitoring page which doesnt have any class name . The HTML file looks like this
<div style="float:left;margin-right:50px"><div>Server:VIP Owner</div><div>Server Role:ACTIVE</div><div>Server State:AVAILABLE</div><div>Network State:GY</div>
how do i parse this html content to a variable like
$Server VIP Owner
$Server_Role Active
$Server_State Available
Since there is no class name.. i am struggling to get this extracted.
$htmlcontent.ParsedHtml.getElementsByTagName('div') | ForEach-Object {
>> New-Variable -Name $_.className -Value $_.textContent
CodePudding user response:
While you are only showing us a very small part of the HTML, it is very likely there are more <div>
tags in there.
Without an id
property or anything else that uniquely identifies the div you are after, you can use a Where-Object
clause to find the part you are looking for.
Try
$div = ($htmlcontent.ParsedHtml.getElementsByTagName('div') | Where-Object { $_.InnerHTML -like '<div>Server Name:*' }).outerText
# if you're on PowerShell version < 7.1, you need to replace the (first) colons into equal signs
$result = $div -replace '(?<!:.*):', '=' | ConvertFrom-StringData
# for PowerShell 7.1, you can use the `-Delimiter` parameter
#$result = $div | ConvertFrom-StringData -Delimiter ':'
The result is a Hashtable like this:
Name Value
---- -----
Server Name VIP Owner
Server State AVAILABLE
Server Role ACTIVE
Network State GY
Of course, if there are more of these in the report, you'll have to loop over divs with something like this:
$result = ($htmlcontent.ParsedHtml.getElementsByTagName('div') | Where-Object { $_.InnerHTML -like '<div>Server Name:*' }) | Foreach-Object {
$_.outerText -replace '(?<!:.*):', '=' | ConvertFrom-StringData
}
Ok, so the original question did not show what we are dealing with..
Apparently, your HTML contains divs like this:
<div>=======================================</div>
<div>Service Name:MysqlReplica</div>
<div>Service Status:RUNNING</div>
<div>Remarks:Change role completed in 1 ms</div>
<div>=======================================</div>
<div>Service Name:OCCAS</div>
<div>Service Status:RUNNING</div>
<div>Remarks:Change role completed in 30280 ms</div>
To deal with blocks like that, you need a whole different approach:
# create a List object to store the results
$result = [System.Collections.Generic.List[object]]::new()
# create a temporary ordered dictionary to build the resulting items
$svcHash = [ordered]@{}
foreach ($div in $htmlcontent.ParsedHtml.getElementsByTagName('div')) {
switch -Regex ($div.InnerText) {
'^= ' {
if ($svcHash.Count) {
# add the completed object to the list
$result.Add([PsCustomObject]$svcHash)
$svcHash = [ordered]@{}
}
}
'^(Service . |Remarks):' {
# split into the property Name and its value
$name, $value = ($_ -split ':',2).Trim()
$svcHash[$name] = $value
}
}
}
if ($svcHash.Count) {
# if we have a final service block filled. This happens when no closing
# <div>=======================================</div>
# was found in the HTML, we need to add that to our final array of PSObjects
$result.Add([PsCustomObject]$svcHash)
}
# output on screen
$result | Format-Table -AutoSize
# output to CSV file
$result | Export-Csv -Path 'X:\services.csv' -NoTypeInformation
Output on screen using the above example:
Service Name Service Status Remarks
------------ -------------- -------
MysqlReplica RUNNING Change role completed in 1 ms
OCCAS RUNNING Change role completed in 30280 ms