Home > Software design >  Finding a table on a webpage with Power Query - table is not always in the same location
Finding a table on a webpage with Power Query - table is not always in the same location

Time:12-16

I use Power Query to scrape a webpage that has multiple tables on it. The tables can move around so one week it may be the second table, the next the 3rd, then back to the 2nd the next week. The table data does have the same text in the first column so I can select it if I can find the entry.

The result of the first step is:

My code is as follows. The issue is the Source{2} can be a different number based on the actual page layout.

let
    Source = Web.Page(Web.Contents("https://www.eia.gov/naturalgas/weekly/#tabs-supply-1")),
    Data0 = Source{2}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Data0,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}}),
    #"Replaced Value" = Table.ReplaceValue(#"Changed Type","U.S. natural gas supply - ","",Replacer.ReplaceText,{"Column1"}),
    RecRepl = Table.ReplaceValue(#"Replaced Value",null, each _[Column1], Replacer.ReplaceValue,{"Column2"}),
    #"Removed Columns" = Table.RemoveColumns(RecRepl,{"Column1"}),
    #"Removed Top Rows" = Table.Skip(#"Removed Columns",1),
    #"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true])
in
    #"Promoted Headers"

How can I search the table in the Data column to find the row I need?

Thanks!!!

I've tried a variety of steps but have been unable to find the table.

CodePudding user response:

try this

This code looks for the phrase Henry Hub in column zero of all the tables in the Data column, and figures out which one has it. It then filters for true and pulls the contents (of the first match). You can adjust what you are looking for in terms of column to search, text to search for, and take it from there

let Source = Web.Page(Web.Contents("https://www.eia.gov/naturalgas/weekly/#tabs-supply-1")),
#"Added Custom" = Table.AddColumn(Source, "Custom", each List.Contains(Table.Column([Data],Table.ColumnNames([Data]){0}),"Henry Hub")),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each [Custom] = true),
Data0 = #"Filtered Rows"{0}[Data]
in Data0

enter image description here

  • Related