Home > Enterprise >  How to validate single entries of a XML structure against an XSD schema
How to validate single entries of a XML structure against an XSD schema

Time:11-04

In my application, I have almost 1 million entries in my DB. This data is transformed into one/several XML files which finally get validated against an XSD. If there is an error the XML file cannot be sent to its destination.

I don't like this all-or-nothing method, failing after a long time of processing. As the data enters (from the db) over the day: Is there a way to validate every single entry alone? I don't want to create a file for each entry due to performance issues, so I wonder if there is a way to extract the XSD into a java object and then partially validate it inside the code?

Can you help?

CodePudding user response:

It is almost more an architecture question than a development question. Here are a few ideas:

  1. As the data is coming in the data, you could publish it into a Kafka topic, a subscriber would take the data as it comes in the topic, validate it, and write the final file or raise an alert on the bad records. You could add a trigger on the database if it supports it.

  2. You are not describing how the data is being extracted from the database to the XML files, maybe you can work there with paging and create smaller files.

  3. You could use something like Apache Spark that would read the data from the database, using a JDBC connection, modify the internal representation in the dataframe, then directly output the file. 1m entries (depending on how wide is your record) is nothing for Spark.

  4. Some database support user-defined functions in Java, so you could have your XSD validation directly at the database level (really not my favorite, but still an option).

Notes:

  • You probably know that, but be careful about creating millions of small files, that would kill your system (hence the Kafka recommendation).
  • Recommendations can vary whether you're on-prem vs. cloud, as you can leverage some PaaS services.

CodePudding user response:

You could feed the data into a streaming schema-aware XSLT 3.0 transformation whose logic is

<xsl:mode streamable="yes">
<xsl:template match="record">
  <xsl:try>
    <xsl:copy-of select="." validation="strict"/>
    <xsl:catch errors="*"/>
  </xsl:try>
</xsl:template>
 

and (if using Saxon) you could capture the validation errors by supplying an InvalidityHandler which would be notified each time invalid data is encountered.

  • Related