I am trying to find a way to perform an outerjoin of two timetables containing the same variables, while not sorting the merged rows of keys based on value. I want to give preference to values from table1 (high quality data) over table2 (low quality data) where the same time sample is present in both tables.
The closest I can find is Merge tables Without sorting on Keys, but this does not work for my situation as I must order the timetable by time in order to use retime, which I need to condense the duplicated time samples.
e.g.
table1 = array2timetable([2;5], 'RowTimes', datetime(2000:2001,1,1), 'VariableNames', {'A'})
table2 = array2timetable([1;3], 'RowTimes', datetime(2001:2002,1,1), 'VariableNames', {'A'})
mergedTable = outerjoin(table1, table2, 'MergeKeys', true, 'Keys', {'Time', 'A'})
table1 =
2×1 timetable
Time A
___________ _
01-Jan-2000 2
01-Jan-2001 5
table2 =
2×1 timetable
Time A
___________ _
01-Jan-2001 1
01-Jan-2002 3
mergedTable =
4×1 timetable
Time A
___________ _
01-Jan-2000 2
01-Jan-2001 1
01-Jan-2001 5
01-Jan-2002 3
My desired output is:
Time A
___________ _
01-Jan-2000 2 <- only in table1
01-Jan-2001 5 <- table1 row first, regardless of value
01-Jan-2001 1 <- table2 row second, regardless of value
01-Jan-2002 3 <- only in table2
This would allow me to use retime to get a set of values for all available time samples, preferring table1 where they overlap:
retime(mergedTable, unique(mergedTable.Time), 'firstvalues')
Time A
___________ _
01-Jan-2000 2
01-Jan-2001 5
01-Jan-2002 3
Maybe there's a better approach than outerjoin and retime?
CodePudding user response:
You could add a column to both source tables with the table data quality "ranking". Include this column in your join. Then after the join you can re-sort by the quality ranking and then the time, and you'd end up with your desired ordering.
table1 = array2timetable([2;5], 'RowTimes', datetime(2000:2001,1,1), 'VariableNames', {'A'})
table2 = array2timetable([1;3], 'RowTimes', datetime(2001:2002,1,1), 'VariableNames', {'A'})
% Assign quality score to the input tables
table1.Quality(:) = 1;
table2.Quality(:) = 2;
mergedTable = outerjoin(table1, table2, 'MergeKeys', true, 'Keys', {'Time', 'A', 'Quality'})
% Re-sort the table to prioritise higher quality data
mergedTable = sortrows( mergedTable, {'Quality', 'Time'} )
Output:
mergedTable =
4×2 timetable
Time A Quality
___________ _ _______
01-Jan-2000 2 1
01-Jan-2001 5 1
01-Jan-2001 1 2
01-Jan-2002 3 2
If you want to remove the helper column afterwards, that is easy
mergedTable.Quality = [];