I have two excel workbooks that contain information about when a truck departs a depo and one where it is received at another. Each file contains the following in separate columns:
Departure: Departure Date, Departure Time, Truck ID, Cargo info
Receive: Arrival Date, Arrival Time, Truck ID
How can I merge these two tables so the receiving table can be populated with the cargo from the departure table using Power Query?
As you can see, there is sometimes the same truck on separate trips on the same date, and therefore, it would be great to allocate the cargo based on the closest date and time for a particular truck ID. Clearly a truck cannot arrive before it has left.
I want to try and do this using Power Query, but I have been scratching my head on how to do it. Any help is greatly appreciated.
My two toy data files can be downloaded
Now you can reference it twice from this point. Filter by Arrivals in one case and filter by Departures by the other, then merge using Index as ID:
CodePudding user response:
Powerquery version:
Load Departure table, here Table1 into Powerquery
Transform data type for date column to date, and time column to time
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Time", type time}, {"Truck_ID", type text}, {"Cargo", type text}})
in #"Changed Type"
File .. close and load .. connection only
Load Receive table into Powerquery
Transform data type for date column to date, and time column to time
Add column, custom column
= (i)=> Table.FirstN(Table.Sort(Table.SelectRows(Table1, each [Date] = i[Date] and [Truck_ID] = i[Truck_ID] and [Time]<=i[Time]),{{"Time", Order.Descending}}),1)
What this does is find all the rows from Table1 (departures) with same date, same truck ID, and Time less than or equal to receive time. It then sorts in descending time order, and takes first row, thus finding closest time.
Finally, expand the cargo column using arrows atop the added new column
Full code for Table2 (Receive table)
let Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Time", type time}, {"Truck_ID", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", (i)=> Table.FirstN(Table.Sort(Table.SelectRows(Table1, each [Date] = i[Date] and [Truck_ID] = i[Truck_ID] and [Time]<=i[Time]),{{"Time", Order.Descending}}),1)),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Cargo"}, {"Cargo"})
in #"Expanded Custom"
CodePudding user response:
And here's another method using Sorting and Grouping to develop the list Algorithm explained in code comments
MCode
let
//Generated by the UI if you select to get Data from Folder
//and the folder has only your tables to process.
//If there are other tables, add a Filter to just select the two you want.
Source = Folder.Files("C:\Users\ron\Desktop\BillyJo"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
//Keep only the Source.Name and Transform Files columns
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
//Combine the two tables
combin = Table.Combine(#"Renamed Columns1"[Transform File]),
#"Changed Type" = Table.TransformColumnTypes(combin,{{"Date", type date}, {"Time", type time}}),
//create dateTime column to more easily compare the times
fullDate = Table.AddColumn(#"Changed Type","fullDate", each [Date] & [Time], type datetime),
//sort by full date ascending
#"Sorted Rows" = Table.Sort(fullDate,{{"fullDate", Order.Ascending}}),
//group by Truck_ID, then fill down the Cargo column and extract the Even rows for the Received table
#"Grouped Rows" = Table.Group(#"Sorted Rows", {"Truck_ID"}, {
{"all", each Table.AlternateRows(Table.FillDown(_,{"Cargo"}),0,1,1),
type table [Date=nullable date, Time=nullable time, Truck_ID=text, Cargo=nullable text, fullDate=datetime]}
}),
//Remove unneeded column and
//Expand the table produced by the Grouping
#"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"Truck_ID"}),
#"Expanded all" = Table.ExpandTableColumn(#"Removed Columns", "all", {"Date", "Time", "Truck_ID", "Cargo"}, {"Date", "Time", "Truck_ID", "Cargo"})
in
#"Expanded all"