I am trying to automate obtaining a file path from a website. The file path changes every Friday. I-CodePudding

Website and Xpath url here

No login required to obtain this file. The website is "https://rigcount.bakerhughes.com/na-rig-count" and the file is the "North America Rotary Rig Count Pivot Table (Feb 2011 - Current)". Right-click, copy link, gives the current file path, "https://rigcount.bakerhughes.com/static-files/cb205922-552b-4f88-954f-665a2b1c731f". Inspecting the file element shows the Xpath with the file path.

I want to create a power query connection that will update the file path name when I perform a weekly scheduled refresh.

I tried using Google Sheets with the formula:

=IMPORTXML(website,Xpath) =IMPORTXML("https://rigcount.bakerhughes.com","/html/body/div[2]/div/div/div/div[2]/article/div/div[2]/div/div/div[2]/div/div/div/table/tbody/tr[2]/td[2]/div/div/article/div/div/div/div/span[1]/a")

Result was an Error: Could not fetch url: https://rigcount.bakerhughes.com

CodePudding user response：

With cURL and a XPath processor like Xidel (available here for Linux/Mac/Win):

curl -s -L 'https://rigcount.bakerhughes.com/na-rig-count' |
xidel --xpath '//a[text()="North America Rotary Rig Count Pivot Table (Feb 2011 - Current)"]/@href'

/static-files/cb205922-552b-4f88-954f-665a2b1c731f

CodePudding user response：

let
    Source = Web.BrowserContents("https://rigcount.bakerhughes.com/na-rig-count"),
    #"Open links" = Html.Table(
        Source
        , {{"link text", "a"}, {"url", "a", each[Attributes][href]?}}
        , [RowSelector = "a"]
    ),
    #"Locate Current target URL" = Table.SelectRows(
        #"Open links"
        , each 
            Text.Contains([#"link text"], "North America Rotary Rig Count Pivot Table")
            and Text.Contains([#"link text"], "Current")
    )
in
    #"Locate Current target URL"