Home > Software engineering >  Timetable outerjoin without sorting on keys
Timetable outerjoin without sorting on keys

Time:11-29

I am trying to find a way to perform an outerjoin of two timetables containing the same variables, while not sorting the merged rows of keys based on value. I want to give preference to values from table1 (high quality data) over table2 (low quality data) where the same time sample is present in both tables.

The closest I can find is Merge tables Without sorting on Keys, but this does not work for my situation as I must order the timetable by time in order to use retime, which I need to condense the duplicated time samples.

e.g.

table1 = array2timetable([2;5], 'RowTimes', datetime(2000:2001,1,1), 'VariableNames', {'A'})
table2 = array2timetable([1;3], 'RowTimes', datetime(2001:2002,1,1), 'VariableNames', {'A'})

mergedTable = outerjoin(table1, table2, 'MergeKeys', true, 'Keys', {'Time', 'A'})

table1 =

  2×1 timetable

       Time        A
    ___________    _

    01-Jan-2000    2
    01-Jan-2001    5


table2 =

  2×1 timetable

       Time        A
    ___________    _

    01-Jan-2001    1
    01-Jan-2002    3


mergedTable =

  4×1 timetable

       Time        A
    ___________    _

    01-Jan-2000    2
    01-Jan-2001    1
    01-Jan-2001    5
    01-Jan-2002    3

My desired output is:

       Time        A
    ___________    _

    01-Jan-2000    2 <- only in table1
    01-Jan-2001    5 <- table1 row first, regardless of value
    01-Jan-2001    1 <- table2 row second, regardless of value
    01-Jan-2002    3 <- only in table2

This would allow me to use retime to get a set of values for all available time samples, preferring table1 where they overlap:

retime(mergedTable, unique(mergedTable.Time), 'firstvalues')

       Time        A
    ___________    _

    01-Jan-2000    2
    01-Jan-2001    5
    01-Jan-2002    3

Maybe there's a better approach than outerjoin and retime?

CodePudding user response:

You could add a column to both source tables with the table data quality "ranking". Include this column in your join. Then after the join you can re-sort by the quality ranking and then the time, and you'd end up with your desired ordering.

table1 = array2timetable([2;5], 'RowTimes', datetime(2000:2001,1,1), 'VariableNames', {'A'})
table2 = array2timetable([1;3], 'RowTimes', datetime(2001:2002,1,1), 'VariableNames', {'A'})

% Assign quality score to the input tables
table1.Quality(:) = 1;
table2.Quality(:) = 2;

mergedTable = outerjoin(table1, table2, 'MergeKeys', true, 'Keys', {'Time', 'A', 'Quality'})

% Re-sort the table to prioritise higher quality data
mergedTable = sortrows( mergedTable, {'Quality', 'Time'} )

Output:

mergedTable =
  4×2 timetable
        Time       A    Quality
    ___________    _    _______
    01-Jan-2000    2    1      
    01-Jan-2001    5    1      
    01-Jan-2001    1    2      
    01-Jan-2002    3    2   

If you want to remove the helper column afterwards, that is easy

mergedTable.Quality = [];
  • Related