Home > Back-end >  How to recursively parse xml data into List of Map of Strings using xpath?
How to recursively parse xml data into List of Map of Strings using xpath?

Time:01-28

Is it possible to convert XML data of any depth, to a list of Map of Strings, recursively in java.

This is my XML data:

     <?xml version="1.0" encoding="UTF-8"?>
      <p:PersonalDetails>
      <Node_1>
        <Node_1_1>
          <name>name 1</name>
          <address>
            <street>17</street>
            <town>1507487</town>
          </address>
          <details>
            <detail_1>detaile item 1</detail_1>
            <detail_2>
                <detail_2_1>detail item 2_1</detail_2_1>
                <detail_2_2>detail item 2_1</detail_2_2>
            </detail_2>
           </details>
         </Node_1_1>
         <Node_1_2>
           <name>name 1</name>
           <address>
              <street>17</street>
              <town>1507487</town>
            </address>
           <details>
             <detail_1>
                <detail_1_1>
                    <detail_1_1_1>detail item 2_1_1</detail_1_1_1>
                </detail_1_1>
                <detail_1_2>detail item 2_1</detail_1_2>
              </detail_1>
              <detail_2>
                <detail_2_1>
                    <detail_2_1_1>
                        <detail_2_1_1_1>detail item 2_1_1_1</detail_2_1_1_1>
                    </detail_2_1_1>
                </detail_2_1>
              </detail_2>
            </details>
       </Node_1_2>
    </Node_1>
</p:PersonalDetails>

I am able to convert to a list of Map of Strings with this code:

    public static void testXpath(String filePath, String expr,String childSubNodeName) throws 
    ParserConfigurationException, XPathExpressionException, IOException, SAXException {
    DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = builderFactory.newDocumentBuilder();
    Document xmlDocument = builder.parse(filePath);
    xmlDocument.getDocumentElement().normalize();
    XPath xPath = XPathFactory.newInstance().newXPath();
    NodeList nodeList = (NodeList) xPath.compile("//" expr).evaluate(xmlDocument, 
    XPathConstants.NODESET);

    List<Map<String,String>> listMap = new LinkedList<>();
    for(int i=0;i<nodeList.getLength();i  ){
        NodeList childNode = (NodeList) nodeList.item(i);
        Map<String,String> map = new HashMap<>();

        for(int j=0;j<childNode.getLength();j  ){
            if(!childNode.item(j).getTextContent().equals("\n")){
                if(childNode.item(j).getNodeName().contains(childSubNode)) { //childSubnodeName
                        extractSubNode(childNode.item(j), map);
                    } else
                        map.put(childNode.item(j).getNodeName(), childNode.item(j).getTextContent());

            }
        }
        listMap.add(map);
    }
    System.out.println(listMap);
    System.out.println("-------------------------");
}

private static void extractSubNode(Node item, Map<String, String> map) {
    NodeList subNode = item.getChildNodes();
    for(int j=0;j<subNode.getLength();j  ){
        if(!subNode.item(j).getTextContent().equals("\n")){
            map.put(item.getNodeName() "." subNode.item(j).getNodeName(),subNode.item(j).getTextContent());
        }
    }
}

But I can only extract 2 level deep. Is there a way I can extract XML data to any depth?

I am expecting a List of Map of Strings:

  [{name=name 1, address.street=17,address.town=1507487,...}]

Thanks.

CodePudding user response:

Your desired mapping is not clear, your input sample is not complete nor well-formed but XPath 3.1 as supported by e.g. Saxon 9.8 or later can use a single XPath 3.1 expression to return a sequence of maps e.g.

for $node in outermost(//Node_1/Node_1_1)
return 
  let $leaves := $node//*[not(*)]
return
  $node!map:merge(
    $leaves!map:entry(string-join((ancestor-or-self::* except $node/ancestor-or-self::*)/local-name(), '.'), string())
  )

at the XPath 3.1 fiddle gives e.g.

{"address.town":"1507487","details.detail_2.detail_2_1":"detail item 2_1","details.detail_2.detail_2_2":"detail item 2_1","address.street":"17","details.detail_1":"detaile item 1","name":"name 1"}

Example Java code using s9api:

    Processor processor = new Processor(false);

    DocumentBuilder docBuilder = processor.newDocumentBuilder();

    XdmNode inputDoc = docBuilder.build(new File("sample1.xml"));

    String xpathExpresssion = "array {\n"  
            "    for $node in outermost(//Node_1/Node_1_1)\n"  
            "    return \n"  
            "      let $leaves := $node//*[not(*)]\n"  
            "    return\n"  
            "      $node!map:merge(\n"  
            "        $leaves!map:entry(string-join((ancestor-or-self::* except $node/ancestor-or-self::*)/local-name(), '.'), string())\n"  
            "      )\n"  
            "}";

    XPathCompiler xpathCompiler = processor.newXPathCompiler();
    xpathCompiler.declareNamespace("map", "http://www.w3.org/2005/xpath-functions/map");

    XdmItem result = xpathCompiler.evaluateSingle(xpathExpresssion, inputDoc);

    System.out.println(result);

To serialize as JSON from Java, add

    StringWriter sw = new StringWriter();
    Serializer serializer = processor.newSerializer(sw);
    serializer.setOutputProperty(Serializer.Property.METHOD, "json");

    processor.writeXdmValue(result, serializer);

    String resultString = sw.toString();

    System.out.println(resultString);

It might even be possible to get a single string result using the JAXP API and XPath 3.1 (or of course using s9api) by calling the serialize XPath function evaluating e.g.

serialize(array {
    for $node in outermost(//Node_1/Node_1_1)
    return 
      let $leaves := $node//*[not(*)]
    return
      $node!map:merge(
        $leaves!map:entry(string-join((ancestor-or-self::* except $node/ancestor-or-self::*)/local-name(), '.'), string())
      )
}, map { 'method':'json'})

Java code:

    XPathFactory saxonXPathFactory = new net.sf.saxon.xpath.XPathFactoryImpl();

    String xpath31Expresssion = "serialize(array {\n"  
            "    for $node in outermost(//Node_1/Node_1_1)\n"  
            "    return \n"  
            "      let $leaves := $node//*[not(*)]\n"  
            "    return\n"  
            "      $node!map:merge(\n"  
            "        $leaves!map:entry(string-join((ancestor-or-self::* except $node/ancestor-or-self::*)/local-name(), '.'), string())\n"  
            "      )\n"  
            "}, map { 'method':'json'})";

    XPath xpath = saxonXPathFactory.newXPath();

    xpath.setNamespaceContext(new NamespaceContext() {
        @Override
        public String getNamespaceURI(String prefix) {
            if (prefix.equals("map"))
                return "http://www.w3.org/2005/xpath-functions/map";
            return null;
        }

        @Override
        public String getPrefix(String namespaceURI) {
            return null;
        }

        @Override
        public Iterator<String> getPrefixes(String namespaceURI) {
            return null;
        }
    });

    String result = xpath.evaluate(xpath31Expresssion, new InputSource("sample1.xml"));

    System.out.println(result);
  • Related