Why is a full TADIR select with ORDER BY PRIMARY KEY much slower than INTO sorted table?-CodePudding

Having the following statements:

SELECT * FROM tadir ORDER BY PRIMARY KEY INTO TABLE @DATA(lt_tadir).

DATA lt_tadir TYPE SORTED TABLE OF tadir WITH UNIQUE KEY pgmid object obj_name.`
SELECT * FROM tadir INTO TABLE @lt_tadir.

Why is the first one around 4 times slower (verified on multiple systems? Relevant statements from the documentation:

For performance reasons, a sort should only take place in the database if supported by an index. This guaranteed only when ORDER BY PRIMARY KEY is specified

If a sorted resulting set is assigned to a sorted internal table, the internal table is sorted again according to the sorting instructions.

First I thought maybe column storage is an issue, but I tried another column storage table where both statements are around similar (even though the second one seems to be a bit faster each time). Same for the row storage I tested.

Also I somehow would have expected that the sort in ABAP (second Docu snipped) would have an performance impact. But it outperforms the primary key index select.

Tables aren't fully buffered, and even if that was the case, ORDER BY PRIMARY KEY does use the buffer.

Any ideas?

CodePudding user response：

I ran the test (several times) and got at least similar execution times for both statements. The Application server sort version about 25% faster than HANA sort. So my mileage was different.

That HANA sort isnt "faster" is only mildly surprising until you look at the table definition. Sorting the entire inverted hash index not what it was designed for. :)

Some "rules" are meant to be broken.
Sorting 5 Million keys with inverted hashes might a good example. And now you have 5 Million records in memory, reading rows quickly by key will favor the internally sorted table. anyway ;)

DATA lt_tadir TYPE SORTED TABLE OF tadir WITH UNIQUE KEY pgmid object obj_name

is access friendly than the simple Standard Table anyway. @Data(lt_tab)

There are known disadvantages with inverted hash indexes. With typical access not normally noticed. https://www.stechies.com/hana-inverted-individual-indexes-interview-questionsand-ans/

CodePudding user response：

I'd like to argue that this performance test is senseless and any conclusions drawn from it are incorrect as:

Sorting in ABAP and sorting on the database only yields the same results if the whole table is selected. If the number of results is limited (e.g. get the last 100 invoices) then sorting on the database yields the correct result while sorting in ABAP (after the limit) does not. As one rarely selects a whole table the test case is completely unrealistic.
Sorting by the primary key (pgmid, object, obj_name) does not make any sense. In which scenario would that be useful? You might want to search for a certain object, then sorting by obj_name might be useful, or you might want to see recent transported objects and sort by the correction (korrnum)

Just to demonstrate run the following example:

REPORT Z_SORT_PERF.

DATA:
  start TYPE timestampl,
  end   TYPE timestampl.

DATA(limit) = 10000.

* ------------- HANA UNCOMMON SORT ----------------------------------------

GET TIME STAMP FIELD start.
SELECT * FROM tadir ORDER BY PRIMARY KEY INTO TABLE @DATA(hana_uncommon) UP TO @limit ROWS.
GET TIME STAMP FIELD end.

WRITE |HANA uncommon sort: { cl_abap_tstmp=>subtract( tstmp1 = end tstmp2 = start ) }s|.
NEW-LINE.


* ------------- HANA COMMON SORT ----------------------------------------

GET TIME STAMP FIELD start.
SELECT * FROM tadir ORDER BY KORRNUM INTO TABLE @DATA(hana_common) UP TO @limit ROWS.
GET TIME STAMP FIELD end.

WRITE |HANA common sort: { cl_abap_tstmp=>subtract( tstmp1 = end tstmp2 = start ) }s|.
NEW-LINE.

On my test system this runs in 1.034s (the uncommon one) vs 0.08s (the common one). That's 10 times faster (though that comparison also makes little sense). Why is that? Because there is a secondary index defined for the KORRNUM field for exactly that purpose (sorting) while the primary key index is supposed for enforcing uniqueness constraints and retrieving single records.

In general a database is not meant to optimize for one single query, but is optimized for overall system performance (under memory constraints). This is usually achieved by sacrificing the performance of uncommon queries for the optimization of common ones.

For performance reasons, a sort should only take place in the database if supported by an index. This guaranteed only when ORDER BY PRIMARY KEY is specified

That's a slightly misleading formulation. There are indices that are not optimal for sorting, and although there is no guarantee that a database creates a secondary index and uses it while sorting, it is very likely that it does so in common cases.