Home > Enterprise >  Select first child from unordered, prioritized list
Select first child from unordered, prioritized list

Time:10-06

Background

Third-party software libraries sometimes include a set of licenses developers may choose when using the library. Some licenses can be detected using ORT to produce an SPDX identifier. We'd like to eliminate the manual process of selecting particular licenses each time, deferring instead to a predefined, prioritized list.

Not all licenses are included in the list.

Code

This section defines the source files.

priorities.xml

The order of entries determines the license to choose when a selection is available:

<priorities>
  <license>Apache-2.0</license>
  <license>MIT</license>
  <license>BSD-2-Clause</license>
  <license>BSD-3-Clause</license>
  <license>CDDL-1.0</license>
  <license>EPL-2.0</license>
  <license>MPL-2.0</license>
  <license>LGPL-3.0</license>
</priorities>

This document is loaded using:

  <xsl:variable
    name="PRIORITY"
    select="document( resolve-uri( 'priorities.xml', base-uri( / ) ) )" />

libraries.xml

These simplified license entries form the input document:

<copyrights>
  <copyright>
     <title>Grizzly HTTP framework</title>
     <licenses>
        <license>CDDL-1.0</license>
        <license>GPL-2.0-or-later</license>
     </licenses>
  </copyright>
  <copyright>
     <title>Java™ JSON Tools Jackson Coreutils</title>
     <licenses>
        <license>LGPL-3.0-or-later</license>
        <license>Apache-2.0</license>
     </licenses>
  </copyright>
  <copyright>
     <title>Javassist</title>
     <licenses>
        <license>LGPL-2.1-only</license>
        <license>MPL-1.1</license>
        <license>Apache-2.0</license>
     </licenses>
  </copyright>
  <copyright>
     <title>Linux Kernel</title>
     <licenses>
      <license with="Linux-syscall-note" order="before">GPL-2.0-only</license>
     </licenses>
  </copyright>
  <copyright>
    <title>Eclipse Temurin™</title>
    <licenses>
      <license with="Classpath-exception-2.0">GPL-2.0-only</license>
    </licenses>
  </copyright>
</copyrights>

Output

The desired output reduces the licenses to a single entry:

<copyrights>
  <copyright>
     <title>Grizzly HTTP framework</title>
     <licenses>
        <license>CDDL-1.0</license>
     </licenses>
  </copyright>
  <copyright>
     <title>Java™ JSON Tools Jackson Coreutils</title>
     <licenses>
        <license>Apache-2.0</license>
     </licenses>
  </copyright>
  <copyright>
     <title>Javassist</title>
     <licenses>
        <license>Apache-2.0</license>
     </licenses>
  </copyright>
  <copyright>
     <title>Linux Kernel</title>
     <licenses>
      <license with="Linux-syscall-note" order="before">GPL-2.0-only</license>
     </licenses>
  </copyright>
  <copyright>
    <title>Eclipse Temurin™</title>
    <licenses>
      <license with="Classpath-exception-2.0">GPL-2.0-only</license>
    </licenses>
  </copyright>
</copyrights>

Problem

Conceptually, I'd like to filter out the licenses based on the position of each input license in the priorities list, then select the first. As a series of transforms the relevant section of the input document might start as:

 <licenses>
    <license>LGPL-2.1-only</license>
    <license>MPL-1.1</license>
    <license>Apache-2.0</license>
 </licenses>

Then we could assign the priority based on the position in the priorities list:

 <licenses>
    <license priority="INFINITY">LGPL-2.1-only</license>
    <license priority="7">MPL-1.1</license>
    <license priority="1">Apache-2.0</license>
 </licenses>

Then sort based on priority:

 <licenses>
    <license priority="1">Apache-2.0</license>
    <license priority="7">MPL-1.1</license>
    <license priority="INFINITY">LGPL-2.1-only</license>
 </licenses>

Then select the first child:

 <licenses>
    <license priority="1">Apache-2.0</license>
 </licenses>

I believe this "algorithm" would ensure that any license that doesn't have a corresponding priority would be selected by default.

Constraints

These constraints will have been met before the transformation step occurs:

  • It is an error to have multiple licenses present in the input document without at least one match in the license priorities listing. I don't think we can realistically select the first license in such cases.

  • It is an error for each entry in the input document to not have a license (i.e., we can enforce this using schema validation).

Question

What would be an expedient way to filter the licenses using XSLT 3.0?

-or-

How would you inject the priority attribute as shown in the algorithmic steps?

CodePudding user response:

Here's my suggestion.

To simplify my test I defined the PRIORITY variable inline instead of using document() but of course you can stick with your approach of reading it from an external file.

An explanation:

The priority XML document is converted to a map in which the keys are the license names and the values are the integer position of that license in the priority list. So the key 'Apache-2.0' has the value 1, etc.

The template matching licenses copies only one of the license child elements.

First it uses the XPath 3 sort function to sort the licenses. The last parameter to that function is a function which maps an item to a sort key; the supplied function looks up the item (i.e. the license name) in the $license-priority map, and returns the resulting priority integer, or if it's not found in the map, it returns infinity.

Then the first (highest priority) license from that sorted sequence is copied.

<xsl:stylesheet 
  version="3.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:map="http://www.w3.org/2005/xpath-functions/map">

  <xsl:variable name="PRIORITY">
    <priorities>
      <license>Apache-2.0</license>
      <license>MIT</license>
      <license>BSD-2-Clause</license>
      <license>BSD-3-Clause</license>
      <license>CDDL-1.0</license>
      <license>EPL-2.0</license>
      <license>MPL-2.0</license>
      <license>LGPL-3.0</license>
    </priorities>
  </xsl:variable>
  
  <xsl:variable name="license-priority" select="
    map:merge(
      $PRIORITY//license ! map:entry(., position())
    )
  "/>

  <xsl:mode on-no-match="shallow-copy"/>
  
  <xsl:template match="licenses" xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xsl:copy>
      <xsl:copy-of select="
        sort(
          license,
          (),
          function($license) {
            ($license-priority($license), xs:double('INF'))[1]
          }
        )[1]
      "/>
    </xsl:copy>
  </xsl:template>
  
</xsl:stylesheet>

CodePudding user response:

Isn't it just

<xsl:template match="copyright/licenses">
  <licenses>
    <xsl:copy-of 
       select="doc('priorities.xml')
          //license[. = current()/license][1]"/>
  </licenses>
</xsl:template>

or have I missed something?

This is taking advantage of the ability to do a one-to-many comparison using "=".

  • Related