Home > Blockchain >  Parsing a Fixed length Flat xml file in spring batch
Parsing a Fixed length Flat xml file in spring batch

Time:11-02

My XML file looks like below,

<?xml version="1.0" encoding="UTF-8"?>
<File fileId="123" xmlns="abc:XYZ" > ABC123411/10/20
XBC128911/10/20
BCD456711/23/22
</File>

This is a fixed length flat xml file, and I need to parse this file as For ex,

ABC123411/10/20

as

name: ABC
id: 1234
Date: 11/10/20

and map this to corresponding object.

This is what I'm trying

<bean id="reader"  scope="step">
    <property name="resource" value="/file path" />
    <property name="linesToSkip" value="2" />
    <property name="lineMapper">
        <bean >
            <property name="lineTokenizer">
                <bean >
                    <property name="names"
                              value="name,id,date"/>
                    <property name="columns"
                              value="1-3,4-7,8-15"/>
                </bean>
            </property>
            <property name="fieldSetMapper">
                <!-- Parse the object -->
                <bean >
                    <property name="prototypeBeanName" value="testRecord" />
            </property>
        </bean>
    </property>
</bean>

Problem 1:

How to ignore the first two and last line of the file, i.e xml tags? linesToSkip is used to skip only start of the file which I'm already doing. or is there any other way to ignore these?

I want to read only the contents between File tag, parse each line using fixed length and return it as list into my processor bean to process these data. I believe my code is trying to ignore first 2 lines but not sure how to ignore last line. also want to know if any better way to do it?

I don't think we can use StaxEventItemReader since this File tag has list of records that needs to be parsed rather than a XML object. please correct me if Im wrong.

CodePudding user response:

I successfully extracted the contents using RegexLineTokenizer instead of FixedLengthTokenizer setting strict to false prevents it from choking on lines that do not match the pattern, but it will create objects with empty properties for them.

   @Bean
   public static RegexLineTokenizer regexpTokenizer() {
     RegexLineTokenizer tok = new RegexLineTokenizer();
     tok.setRegex("([A-Za-z]{3})(\\d{4})(\\d{2}/\\d{2}/\\d{2})");
     tok.setNames("name","id","date" );
     tok.setStrict(false);
     return tok;
   }

Here is what that translates to as an XML configuration:

<bean id="reader"  scope="step">
<property name="resource" value="/file path" />
<property name="linesToSkip" value="2" />
<property name="lineMapper">
    <bean >
        <property name="lineTokenizer">
            <bean >
                <property name="names"
                          value="name,id,date"/>
                <property name="regex"
                          value="([A-Za-z]{3})(\d{4})(\d{2}/\d{2}/\d{2})"/>
                <property name="strict" value="false"/>
            </bean>
        </property>
        <property name="fieldSetMapper">
            <!-- Parse the object -->
            <bean >
                <property name="prototypeBeanName" value="testRecord" />
        </property>
    </bean>
</property>

CodePudding user response:

I would keep it simple and create a tasklet that transforms this:

<?xml version="1.0" encoding="UTF-8"?>
<File fileId="123" xmlns="abc:XYZ" > ABC123411/10/20
XBC128911/10/20
BCD456711/23/22
</File>

into this:

ABC123411/10/20
XBC128911/10/20
BCD456711/23/22

and then create a chunk-oriented step with a FlatFileItemReader to parse the new file. This would be simpler than trying to find a way to ignore lines, use regular expressions to parse the content, etc.

  • Related