Home > database >  Order of XML nodes from document preserved in insert?
Order of XML nodes from document preserved in insert?

Time:08-19

If I do:

INSERT INTO dst
SELECT blah
FROM src
CROSS APPLY xmlcolumn.nodes('blah')

where dst has an identity column, can one say for certain that the identity column order matches the order of the nodes from the original xml document?

I think the answer is no, there are no guarantees and that to ensure the ordering is able to be retained, some ordering information needs to also be extracted from the XML at the same time the nodes are enumerated.

CodePudding user response:

There's no way to see it explicitly in an execution plan, but the id column returned by the nodes() method is a varbinary(900) OrdPath, which does encapsulate the original xml document order.

The solution offered by Mikael Eriksson on the related question Does the `nodes()` method keep the document order? relies on the OrdPath to provide an ORDER BY clause necessary to determine how identity values are assigned for the INSERT.

A slightly more compact usage follows:

CREATE TABLE #T 
(
    ID integer IDENTITY, 
    Fruit nvarchar(10) NOT NULL
);

DECLARE @xml xml = 
    N'
    <Fruits>
      <Apple />
      <Banana />
      <Orange />
      <Pear />
    </Fruits>
    ';

INSERT #T 
    (Fruit)
SELECT 
    N.n.value('local-name(.)', 'nvarchar(10)') 
FROM @xml.nodes('/Fruits/*') AS N (n)
ORDER BY
    ROW_NUMBER() OVER (ORDER BY N.n);

SELECT 
    T.ID, 
    T.Fruit 
FROM #T AS T
ORDER BY
    T.ID;

db<>fiddle

Using the OrdPath this way is presently undocumented, but the technique is sound in principle:

  1. The OrdPath reflects document order.
  2. The ROW_NUMBER computes sequence values ordered by OrdPath.
  3. The ORDER BY clause uses the row number sequence.
  4. Identity values are assigned to rows as per the ORDER BY.

To be clear, this holds even if parallelism is employed. As Mikael says, the dubious aspect is using id in the ROW_NUMBER since id is not documented to be the OrdPath.

CodePudding user response:

Under the current implementation of .nodes, the XML nodes are generated in document order. The result of that is always joined to the original data using a nested loops, which always runs in order also.

Furthermore, inserts are generally serial (except under very specific circumstances that it goes parallel, usually when you have an empty table, and never with an IDENTITY value being generated).

Therefore there is no reason why the server would ever return rows in a different order than the document order. You can see from this fiddle that that is what happens.

That being said, there is no guarantee that the implementation of .nodes won't change, or that inserts may in future go parallel, as neither of these is documented anywhere as being guaranteed. So I wouldn't rely on it without an explicit ORDER BY, and you do not have a column to order it on.

Using an ORDER BY would guarantee it. The docs state: "INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted."


Even using ROW_NUMBER as some have recommended is also not guaranteed. The only real solution is to get the document order directly from XQuery.

The problem is that SQL Server's version of XQuery does not allow using position(.) as a result, only as a predicate. Instead, you can use a hack involving the << positional operator.

For example:

SELECT T.X.value('text()[1]', 'nvarchar(100)') as RowLabel,
       T.X.value('let $i := . return count(../*[. << $i])   1', 'int') as RowNumber
FROM src
CROSS APPLY xmlcolumn.nodes('blah') as T(X);

What this does is:

  • Assign the current node . to the variable $i
  • Takes all the nodes in ../* i.e. all children of the parent of this node
  • ... [. << $i] that are previous to $i
  • and counts them
  • Then add 1 to make it one-based
  • Related