Using PowerQuery I import data from a CSV that looks like this:
Report Title,,,,,
,Date,Type,USER_ID,PICKED_QTY,No of Hours
,31/10/2021,Type A,User_1,300,3
,31/10/2021,Type A,User_3,250,8
,01/11/2021,Type B,User_1,167,5
,01/11/2021,Type C,User_2,988,2
,02/11/2021,Type A,User_1,1113,4
Date,Type,USER_ID,PICKED_QTY,No of Hours,
03/11/2021,Type C,User_1,1500,5,
04/11/2021,Type A,User_1,200,8,
sometimes it looks like this (no second page) - which is where the problem is:
Report Title,,,,,
,Date,Type,USER_ID,PICKED_QTY,No of Hours
,31/10/2021,Type A,User_1,300,3
,31/10/2021,Type A,User_3,250,8
,01/11/2021,Type B,User_1,167,5
,01/11/2021,Type C,User_2,988,2
,02/11/2021,Type A,User_1,1113,4
I get the data into a readable format using this PQ (this source would be different, but references a table here for simplicity):
DataSource:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
SplitData = Table.SplitColumn(Source,"Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv))
in
SplitData
I then use two queries to line the data up, so the Date columns are both in the same column, etc.
Query1:
let
Source = DataSource,
RemoveTopRows = Table.Skip(Source,1),
PromoteHeaders = Table.PromoteHeaders(RemoveTopRows, [PromoteAllScalars=true]),
FilterRows = Table.SelectRows(PromoteHeaders, each ([#""] = "")),
RemoveOtherColumns = Table.SelectColumns(FilterRows,{"Date", "Type", "USER_ID", "PICKED_QTY", "No of Hours"}),
ChangeType = Table.TransformColumnTypes(RemoveOtherColumns,{{"Date", type date}, {"Type", type text},
{"USER_ID", type text}, {"PICKED_QTY", Int64.Type},
{"No of Hours", Int64.Type}})
in
ChangeType
Query2:
let
Source = DataSource,
RemoveTopRows = Table.Skip(Source,1),
FilterRows = Table.SelectRows(RemoveTopRows, each ([Column1.1] <> "")),
PromoteHeaders = Table.PromoteHeaders(FilterRows, [PromoteAllScalars=true]),
RemoveOtherColumns = Table.SelectColumns(PromoteHeaders,{"Date", "Type", "USER_ID", "PICKED_QTY", "No of Hours"}),
FilterRows2 = Table.SelectRows(RemoveOtherColumns, each ([Date] <> "Date")),
ChangeType = Table.TransformColumnTypes(FilterRows2,{{"Date", type date}, {"Type", type text}, {"USER_ID", type text},
{"PICKED_QTY", Int64.Type}, {"No of Hours", Int64.Type}})
in
ChangeType
Finally, I join the previous two queries together and group to get my final table.
Query3:
let
Source = Query1,
AppendQueries = Table.Combine({Source, Query2}),
SortRows = Table.Sort(AppendQueries,{{"Date", Order.Ascending}}),
GroupRows = Table.Group(SortRows, {"Date", "Type"}, {{"Picked Qty", each List.Sum([PICKED_QTY]), type nullable number},
{"Total Hours", each List.Sum([No of Hours]), type nullable number}}),
AddDivision = Table.AddColumn(GroupRows, "Rate", each [Picked Qty] / [Total Hours], type number)
in
AddDivision
Question
Sometimes my raw data doesn't include a second page of data, so there's no need for Query2.
When this happens, if I don't manually add the headers for the second page I get an error: [Expression Error] The column 'Date' of the table wasn't found
.
How do I avoid this? The error appears in Query2
with RemoveOtherColumns
- without column headers it can't find the correct column, and in Query3
as it can't append a query that's returning an error.
CodePudding user response:
Without re-writing all of it, you could just change the last line of Query2 to be
in try ChangeType otherwise Table.FromRecords({[Date = null, Type = null, USER_ID=null, PICKED_QTY=null, No of Hours = null]})
or
in try ChangeType otherwise Table.Skip(Table.FromRecords({[Date = null, Type = null, USER_ID=null, PICKED_QTY=null, No of Hours = null]}),1)
creating:
or just do the whole thing in one query
let Source = Csv.Document(File.Contents("C:\temp2\data.csv"),[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Removed Top Rows" = Table.Skip(Source,1),
#"Filtered Rows" = Table.PromoteHeaders(Table.SelectRows(#"Removed Top Rows", each [Column1] = ""), [PromoteAllScalars=true]),
#"Filtered Rows2" = Table.PromoteHeaders(Table.SelectRows(#"Removed Top Rows", each [Column1] <> ""), [PromoteAllScalars=true]),
AppendQueries = Table.Combine({#"Filtered Rows",#"Filtered Rows2"}),
SortRows = Table.Sort(AppendQueries,{{"Date", Order.Ascending}}),
#"Changed Type1" = Table.TransformColumnTypes(SortRows,{{"PICKED_QTY", type number}, {"No of Hours", type number}}),
GroupRows = Table.Group(#"Changed Type1", {"Date", "Type"}, {{"Picked Qty", each List.Sum([PICKED_QTY]), type number}, {"Total Hours", each List.Sum([No of Hours]), type number}}),
AddDivision = Table.AddColumn(GroupRows, "Rate", each [Picked Qty] / [Total Hours], type number)
in AddDivision