Home > other >  Webscrape HTML Table
Webscrape HTML Table

Time:08-03

I am currently building a macro with VBA in order to automatically obtain te magnetic declination of a certain point on Earth. I managed to automate the process of filling in the right coordinates and make sure the output is in the right format. First, I wanted to have the output as a CSV-file. The process of automatically downloading that file and then subtracting the correct number was working. However, I know want to use the HTML option as output, since that avoids downloading files and thus makes the process (in my opinion) much faster. The problem I now have, is related to subtracting the magnetic variation from the HTML pop-up. When I am looking into the source code of the webpage, it is unclear to me how to properly obtain the right number, since this number doesn't have an ID and the table it is displayed in has the same class name as the input table...

I think I need to, somehow, loop over the elements of the output table and in that way obtain the magnetic variation.

The magnetic variation is calculated via: https://www.ngdc.noaa.gov/geomag/calculators/magcalc.shtml#declination

The number I am trying to obtain has the following position in the source code: enter image description here

The piece of VBA code I use to automatically generate the output:

Sub ScrapeWebData()
    Dim ieObj As InternetExplorer
    Dim htmlEle As IHTMLElement
    Dim i As Integer
    
    i = 2
    
    Set IE = CreateObject("InternetExplorer.Application")
    'Set ieObj = New InternetExplorer
    IE.Visible = True
    IE.navigate "https://www.ngdc.noaa.gov/geomag/calculators/magcalc.shtml"
    
    Application.Wait Now   TimeValue("00:00:05")
    
    Set doc = IE.document
    
    'Import right latitude and longitude into the Magnetic Declination Calculator
    
    'Fill in the latitude
    doc.getElementById("declinationLat1").Value = ThisWorkbook.Sheets("Sheet1").Range("A1").Value
    
    'Choose North or South
    If ThisWorkbook.Sheets("Sheet1").Range("B1").Value = "S" Then
        doc.all.Item("lat1Hemisphere")(0).Checked = True
    End If
    
    If ThisWorkbook.Sheets("Sheet1").Range("B1").Value = "N" Then
        doc.all.Item("lat1Hemisphere")(1).Checked = True
    End If
    
    'Fill in the longitude
    doc.getElementById("declinationLon1").Value = ThisWorkbook.Sheets("Sheet1").Range("C1").Value
    
    'Choose West or East
    If ThisWorkbook.Sheets("Sheet1").Range("D1").Value = "W" Then
        doc.all.Item("lon1Hemisphere")(0).Checked = True
    End If
    
    If ThisWorkbook.Sheets("Sheet1").Range("D1").Value = "E" Then
        doc.all.Item("lon1Hemisphere")(1).Checked = True
    End If
    
    'Choose format of the output file to be HTML
    doc.all.Item("resultFormat")(0).Checked = True
    
    doc.getElementById("calcbutton").Click

'Here I tried to obtain the right number from the output, but it doesn't work sadly. 
    
    For Each htmlEle In IE.document.getElementById("declinationResultContents")
        For Each htmlnew In IE.document.getElementsByClassName("shadow")(4).getElementsByTagName("tr")
        
        With ActiveSheet
            .Range("A" & i).Value = htmlnew.Children(0).textContent
        End With
        
        i = i   1
        Next htmlnew
    Next htmlEle
    
End Sub

Here, I use the following cells in my excel sheet: enter image description here

Is someone familiar with web scraping of HTML-tables where the desired number (and table) doesn't have a clear identifier? Could you please help me? Thanks in advance!!

CodePudding user response:

Actually I didn't want to give any more answers about web scraping with IE. But since this page still seems to run flawlessly, I took a look at how the popup works.

But first a hint:
The first line in every VBA module should be Option Explicit. This will first check if all variables are declared when starting a macro. This is sometimes annoying, but can avoid very time-consuming error searches just because there is a typo.

There is more to optimize in your macro, but all in all it works. I also looked at the whole thing because you commented the code in such a way that I understood directly what you wanted to do. That is very pleasant.

Delete the following code block. It does not work like this:

'Here I tried to obtain the right number from the output, but it doesn't work sadly.
For Each htmlEle In IE.document.getElementById("declinationResultContents")
    For Each htmlnew In IE.document.getElementsByClassName("shadow")(4).getElementsByTagName("tr")

    With ActiveSheet
        .Range("A" & i).Value = htmlnew.Children(0).textContent
    End With

    i = i   1
    Next htmlnew
Next htmlEle

Replace it with this code. You should be able to use it to read the data you really need:

'The needed data stands in the last html table of the document
'This table will build when clicking the button
'So we have to wait until the browser has generated the code
'(It would be better to use a loop here and not Application.Wait
'But I use the quick and dirty way here ;-)
Application.Wait Now   TimeValue("00:00:05")

Set htmlEle = IE.document.getElementsByTagName("table")
MsgBox htmlEle(htmlEle.Length - 1).innertext

Edit

Here is the whole macro like it works for me. I started to implement the loop, so that Application.Wait does not have to be used. But that leads to a follow up problem on the page. This can probably also be solved, but the effort is relatively large. So I deleted it to keep the code clear.

Sub ScrapeWebData()
    Const url As String = "https://www.ngdc.noaa.gov/geomag/calculators/magcalc.shtml"
    Dim IE As Object
    Dim doc As Object
    Dim htmlEle As Object
    
    Set IE = CreateObject("InternetExplorer.Application")
    IE.Visible = True
    IE.navigate url
    Application.Wait Now   TimeValue("00:00:10")
    Set doc = IE.document
    
    'Import right latitude and longitude into the Magnetic Declination Calculator
    'Fill in the latitude
    doc.getElementById("declinationLat1").Value = ThisWorkbook.Sheets("Sheet1").Range("A1").Value
    
    'Choose North or South
    If ThisWorkbook.Sheets("Sheet1").Range("B1").Value = "S" Then
        doc.all.Item("lat1Hemisphere")(0).Checked = True
    End If
    
    If ThisWorkbook.Sheets("Sheet1").Range("B1").Value = "N" Then
        doc.all.Item("lat1Hemisphere")(1).Checked = True
    End If
    
    'Fill in the longitude
    doc.getElementById("declinationLon1").Value = ThisWorkbook.Sheets("Sheet1").Range("C1").Value
    
    'Choose West or East
    If ThisWorkbook.Sheets("Sheet1").Range("D1").Value = "W" Then
        doc.all.Item("lon1Hemisphere")(0).Checked = True
    End If
    
    If ThisWorkbook.Sheets("Sheet1").Range("D1").Value = "E" Then
        doc.all.Item("lon1Hemisphere")(1).Checked = True
    End If
    
    'Choose format of the output file to be HTML
    doc.all.Item("resultFormat")(0).Checked = True
    
    doc.getElementById("calcbutton").Click
    
    'The needed data stands in the last html table of the document
    'This table will build when clicking the button
    'So we have to wait until the browser has generated the code
    Application.Wait Now   TimeValue("00:00:05")
    Set htmlEle = IE.document.getElementsByTagName("table")
    'You can write the following result were ever you want
    'Here it's a message box, but you can use also a cell
    'or Debug.Print for testing senses
    MsgBox htmlEle(htmlEle.Length - 1).getElementsByTagName("tbody")(0).getElementsByTagName("td")(1).innertext
End Sub
  • Related