Home > front end >  Extract attributes from xml in nifi
Extract attributes from xml in nifi

Time:12-18

I have these xml files where I get them from ftp (with list and fetch ftp processor). I want to get the values from the xml file and replace the file with these values as it was a csv . (and put them back to ftp with putFtp processor)

The desired output is something like this:

{"foodate":"somedate","name":"fooid1_foovalue","value":some-numbers}
{"foodate":"somedate","name":"fooid1_metrics","value":some-metrics}
.
.
.
{"foodate":"somedate","name":"fooid2_foovalue","value":some-numbers}
.
.
.

So for each id write first foodate attribute and then id1 , sample - attribute 1, id1, sample - attribute 2, etc.

However each time I dont know the name or how many the attributes will be.Only that the first sample attribute will be foodate. Any idea how to procceed? I tried with executeScript processor and js but it seems to not recognize DOMParser() etc.

<?xml version="1.0" encoding="ISO-8859-1"?>
<Document Version="2">
    <ExportData lowerBound="2021/11/24 16:58:26" upperBound="2021/11/24 22:58:26">
        <Site name="name" f="">
            <Kapta fooid1="some-number">
                <Infos>
                    <Info>
                        <EndPoint foo="value-name" />
                    </Info>
                </Infos>
                <Samples ordering="desc">
                    <Sample foodate="some-date" foovalue="some-numbers" metrics="some-metrics" metrics2="metrics-again" value="numbers5" te="numbers" />
                    <Sample foodate="some-date" foovalue="some-numbers" foom="some-metrics" metrics453="metrics-again" otherattribut="numbers5" att345="numbers" morevalues="numbers" foohdeiurf="numbers" hello="numbers"/>
                </Samples>
            </Kapta>
            <Kapta fooid2="some-number">
                <Infos>
                    <Info>
                        <EndPoint foo="value-name" />
                    </Info>
                </Infos>
                <Samples ordering="desc">
                    <Sample foodate="some-date" foovalue="some-numbers" metrics="some-metrics" metrics2="metrics-again" value="numbers" te="numbersagain" />
                    <Sample foodate="some-date" foo="some-numbers" metrics="some-metrics" metrics2="metrics-again" value="numbers" te="numbers" />
                    <Sample foodate="some-date" attr="some-numbers" someothermetrics="some-metrics" metr="metrics-again" anothervalue="numbers" />
                </Samples>
            </Kapta>
        </Site>
    </ExportData>
</Document>

Thanks a lot for your time and effort!

CodePudding user response:

You can use groovy xml parser libraries. There are lots of option according to your needs, check this

Here is an experimental code, it gets the xml from content of incoming flow file and outputs some extractions as json list. You can develop it with your requirement

Please note that this code may not be production grade. See ExecuteScript cookbook for more about Groovy in Nifi

import org.apache.nifi.flowfile.FlowFile;
import org.apache.commons.io.IOUtils
import org.apache.nifi.processor.io.InputStreamCallback
import org.apache.nifi.processor.io.StreamCallback
import java.nio.charset.StandardCharsets
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import groovy.xml.dom.DOMCategory
import groovy.json.JsonGenerator

def flowFile

try {
    
    flowFile = session.get()
    
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
    Document doc = null

    session.read(flowFile, {inputStream ->
        doc =  dBuilder.parse(inputStream)
    } as InputStreamCallback)
    
    def root = doc.documentElement
    def sb = new StringBuilder()
    def jsonGenerator = new JsonGenerator.Options().disableUnicodeEscaping().build()
    
    use(DOMCategory) {
         root['ExportData']['Site']['Kapta']['Samples']['*'].findAll { node ->
            def data = new LinkedHashMap()
            data.foodate = node['@foodate']
            data.foovalue = node['@foovalue']
            sb.append(jsonGenerator.toJson(data))
            sb.append('\n')
        }   
    }   
    
    flowFile = session.write(flowFile, {inputStream, outputStream ->
        outputStream.write(sb.toString().getBytes(StandardCharsets.UTF_8))
    } as StreamCallback)
    
    session.transfer(flowFile, REL_SUCCESS)
    
} catch (Exception e) {
    log.error('',e)
    session.transfer(flowFile, REL_FAILURE)
}
   
  • Related