Home > OS >  Avoid Excel PowerQuery error when second data page headings not available
Avoid Excel PowerQuery error when second data page headings not available

Time:11-05

Using PowerQuery I import data from a CSV that looks like this:

Report Title,,,,,
,Date,Type,USER_ID,PICKED_QTY,No of Hours
,31/10/2021,Type A,User_1,300,3
,31/10/2021,Type A,User_3,250,8
,01/11/2021,Type B,User_1,167,5
,01/11/2021,Type C,User_2,988,2
,02/11/2021,Type A,User_1,1113,4
Date,Type,USER_ID,PICKED_QTY,No of Hours,
03/11/2021,Type C,User_1,1500,5,
04/11/2021,Type A,User_1,200,8,    

sometimes it looks like this (no second page) - which is where the problem is:

Report Title,,,,,
,Date,Type,USER_ID,PICKED_QTY,No of Hours
,31/10/2021,Type A,User_1,300,3
,31/10/2021,Type A,User_3,250,8
,01/11/2021,Type B,User_1,167,5
,01/11/2021,Type C,User_2,988,2
,02/11/2021,Type A,User_1,1113,4

I get the data into a readable format using this PQ (this source would be different, but references a table here for simplicity):

DataSource:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    SplitData = Table.SplitColumn(Source,"Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv))
in
    SplitData 

enter image description here


I then use two queries to line the data up, so the Date columns are both in the same column, etc.

Query1:

let
    Source = DataSource,
    RemoveTopRows = Table.Skip(Source,1),
    PromoteHeaders = Table.PromoteHeaders(RemoveTopRows, [PromoteAllScalars=true]),
    FilterRows = Table.SelectRows(PromoteHeaders, each ([#""] = "")),
    RemoveOtherColumns = Table.SelectColumns(FilterRows,{"Date", "Type", "USER_ID", "PICKED_QTY", "No of Hours"}),
    ChangeType = Table.TransformColumnTypes(RemoveOtherColumns,{{"Date", type date}, {"Type", type text}, 
                                                                {"USER_ID", type text}, {"PICKED_QTY", Int64.Type}, 
                                                                {"No of Hours", Int64.Type}})
in
    ChangeType  

enter image description here

Query2:

let
    Source = DataSource,
    RemoveTopRows = Table.Skip(Source,1),
    FilterRows = Table.SelectRows(RemoveTopRows, each ([Column1.1] <> "")),
    PromoteHeaders = Table.PromoteHeaders(FilterRows, [PromoteAllScalars=true]),
    RemoveOtherColumns = Table.SelectColumns(PromoteHeaders,{"Date", "Type", "USER_ID", "PICKED_QTY", "No of Hours"}),
    FilterRows2 = Table.SelectRows(RemoveOtherColumns, each ([Date] <> "Date")),
    ChangeType = Table.TransformColumnTypes(FilterRows2,{{"Date", type date}, {"Type", type text}, {"USER_ID", type text}, 
                                                         {"PICKED_QTY", Int64.Type}, {"No of Hours", Int64.Type}})
in
    ChangeType  

enter image description here


Finally, I join the previous two queries together and group to get my final table.

Query3:

let
    Source = Query1,
    AppendQueries = Table.Combine({Source, Query2}),
    SortRows = Table.Sort(AppendQueries,{{"Date", Order.Ascending}}),
    GroupRows = Table.Group(SortRows, {"Date", "Type"}, {{"Picked Qty", each List.Sum([PICKED_QTY]), type nullable number}, 
                                                         {"Total Hours", each List.Sum([No of Hours]), type nullable number}}),
    AddDivision = Table.AddColumn(GroupRows, "Rate", each [Picked Qty] / [Total Hours], type number)
in
    AddDivision

enter image description here


Question

Sometimes my raw data doesn't include a second page of data, so there's no need for Query2.
When this happens, if I don't manually add the headers for the second page I get an error: [Expression Error] The column 'Date' of the table wasn't found.

How do I avoid this? The error appears in Query2 with RemoveOtherColumns - without column headers it can't find the correct column, and in Query3 as it can't append a query that's returning an error.

CodePudding user response:

Without re-writing all of it, you could just change the last line of Query2 to be

in try ChangeType otherwise Table.FromRecords({[Date = null, Type = null, USER_ID=null, PICKED_QTY=null, No of Hours = null]})

or

in try ChangeType otherwise Table.Skip(Table.FromRecords({[Date = null, Type = null, USER_ID=null, PICKED_QTY=null, No of Hours = null]}),1)

creating:

enter image description here


or just do the whole thing in one query

let Source = Csv.Document(File.Contents("C:\temp2\data.csv"),[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Removed Top Rows" = Table.Skip(Source,1),
#"Filtered Rows" = Table.PromoteHeaders(Table.SelectRows(#"Removed Top Rows", each [Column1] = ""), [PromoteAllScalars=true]),
#"Filtered Rows2" = Table.PromoteHeaders(Table.SelectRows(#"Removed Top Rows", each [Column1] <> ""), [PromoteAllScalars=true]),
AppendQueries = Table.Combine({#"Filtered Rows",#"Filtered Rows2"}),
SortRows = Table.Sort(AppendQueries,{{"Date", Order.Ascending}}),
#"Changed Type1" = Table.TransformColumnTypes(SortRows,{{"PICKED_QTY", type number}, {"No of Hours", type number}}),
GroupRows = Table.Group(#"Changed Type1", {"Date", "Type"}, {{"Picked Qty", each List.Sum([PICKED_QTY]), type number}, {"Total Hours", each List.Sum([No of Hours]), type number}}),
AddDivision = Table.AddColumn(GroupRows, "Rate", each [Picked Qty] / [Total Hours], type number)
in AddDivision
  • Related