Home > Software engineering >  Generating an XML file with conditional output based on a dataframe
Generating an XML file with conditional output based on a dataframe

Time:02-16

I'm trying to generate an XML-file based on a dataframe. The XML output should be structured as having some general information, then a list of transactions, followed by some outtro comments. The tricky bit is that each transaction type needs a different output and parent tag, meaning I need to generate several templates that are executed conditionally. The data looks something like this:

secondary_info_1 transactionnumber date_transaction amount_local from_country to_country transaction_type gender
Some stuff goes here trx001 2022-01-02 2883 DE incoming_1 M
Some stuff goes here trx003 2022-01-04 857 incoming_2 M
Some stuff goes here trx004 2022-01-05 4 AT outgoing_1 M
Some stuff goes here trx006 2022-01-07 26 outgoing_2 M

And the desired output looks something like this:

<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<record>
    <secondary_info_1>Some stuff goes here</secondary_info_1>
    <incoming_transaction_1>
        <transaction_from>
            <transactionnumber>trxno001</transactionnumber>
            <date_transaction>2022-01-02</date_transaction>
            <amount_local>2883</amount_local>
            <from_country>DE</from_country>
        </transaction_from>
        <transaction_to>
            <personalia>
                <gender>M</gender>
            </personalia>
        </transaction_to>
    </incoming_transaction_1>
    <incoming_transaction_2>
        <transaction_from>
            <transactionnumber>trxno003</transactionnumber>
            <date_transaction>2022-01-04</date_transaction>
            <amount_local>857</amount_local>
        </transaction_from>
        <transaction_to>
            <personalia>
                <gender>M</gender>
            </personalia>
        </transaction_to>
    </incoming_transaction_2>
    <outgoing_transaction_1>
        <transaction_to>
            <personalia>
                <gender>M</gender>
            </personalia>
        </transaction_to>
        <transaction_from>
            <transactionnumber>trxno004</transactionnumber>
            <date_transaction>2022-01-05</date_transaction>
            <amount_local>4</amount_local>
            <from_country>AT</from_country>
        </transaction_from>
    </outgoing_transaction_1>
    <outgoing_transaction_2>
        <transaction_to>
            <personalia>
                <gender>M</gender>
            </personalia>
        </transaction_to>
        <transaction_from>
            <transactionnumber>trxno006</transactionnumber>
            <date_transaction>2022-01-07</date_transaction>
            <amount_local>26</amount_local>
        </transaction_from>
    </outgoing_transaction_2>
    <secondary_info_2>Some more stuff goes here</secondary_info_2>
</record>

So in short we have a transaction overview where the personalia is attached to each transaction, and some information above and below these transactions (secondary_info_1/2). The templates change based on the value in the transaction_typecolumn.

I find generating a more static XML works very well indeed by using the stylesheet argument in the pandas.to_xml() function, however I struggle to create a conditionally triggered template. I tried modifying the approach taken here (cell 366 and 367, right before "XML Final Notes"), but I can't figure out how to use <xsl_if test="transaction_type == 'incoming_1'"> etc to trigger my different transaction templates conditionally.

Can someone help with this?

Edit: The general structure of the desired output is the following (context is also added as text in the tags):

<record>
    <secondary_info>
        <nested_secondary_info>A bunch of added information should go in this first section of the document</nested_secondary_info>
    <incoming_transaction_1>
        <incoming_nested_transaction_info_1>Some, but not all of the fields go here, and the structure changes based on the transaction type</incoming_nested_transaction_info_1>
    </incoming_transaction_1>
    <outgoing_transaction_1>
        <outgoing_nested_transaction_info_2>For example, the "from_country" field should only be returned for incoming_transaction_1, and "to_country" should only be returned for "outgoing_transaction_1"</outgoing_nested_transaction_info_2>
    </outgoing_transaction_1>
    <incoming_transaction_2>
        <incoming_nested_transaction_info_2>Ideally the transactions would also be sorted based on the date, but it seems that works as long as the dataframe is sorted correctly</incoming_nested_transaction_info_2>
    </incoming_transaction_2>
    <some_more_secondary_info>
        <some_more_nested_secondary_info>I would also need to add some information to the end of this document as a sibling of the transactional information</some_more_nested_secondary_info> 
    </some_more_secondary_info>
</record>

Writing the specific templates will be a pain no matter how it's done, due to the specific requirements for the output. However I've been able to write a basic XSLT that allows me to produce the transactional overview:

<xsl:stylesheet version = "1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="/data">
        <report>
            <xsl:apply-templates select="row"/>
        </report>
    </xsl:template>

    <xsl:template match="row">
        <xsl:choose>
            <xsl:when test="transaction_type='incoming_1'">
                <transaction_account_to_consumer>
                    <xsl:copy-of select="amount_local"/>
                    <t_from>
                        <xsl:copy-of select="from_country"/>
                    </t_from>
                </transaction_account_to_consumer>
            </xsl:when>
            <xsl:when test="transaction_type='incoming_2'">
                <transaction_entity_to_consumer>
                    <xsl:copy-of select="amount_local"/>
                </transaction_entity_to_consumer>
            </xsl:when>
            <xsl:when test="transaction_type='outgoing_1'">
                <transaction_consumer_to_account>
                    <t_to>
                        <xsl:copy-of select="to_country"/>
                    </t_to>
                    <xsl:copy-of select="amount_local"/>
                </transaction_consumer_to_account>
            </xsl:when>
            <xsl:when test="transaction_type='outgoing_2'">
                <transaction_consumer_to_entity>
                    <xsl:copy-of select="amount_local"/>
                </transaction_consumer_to_entity>
            </xsl:when>
        </xsl:choose>
    </xsl:template>

</xsl:stylesheet>

This template will balloon to several hundred lines (each transaction type sub-template requires >50 tags), which is annoying, but manageable. However, a bigger problem is that although I can get the conditional templating to work for my purposes, I cannot figure out how to add my secondary_info above and below the transactional information.

If I add this before xsl:choose, it just appends it above each transaction, whereas I would like this only to be appended once, before the list of transactions.

My idea to solve this was to match only the first row in a different template, and then use that template before <xsl:apply-templates select="row"/>, meaning the information at the top of the XSLT would look more like this:

<xsl:template match="/data">
    <report>
        <xsl:apply-templates select="first_row"/>
        <xsl:apply-templates select="row"/>
        <xsl:apply-templates select="first_row"/>
    </report>
</xsl:template>

Where each template would only select certain fields from the first row of the dataframe. However, the two "select="first_row"" templates would select different fields from the first row of the transaction overview.

Is there a way I can achieve this?

CodePudding user response:

May be as simple as you have the test syntax slightly wrong, it's test="transaction_type = 'incoming_1'", with =, not ==.

Here's a simple XSLT to only keep <row> if transaction_type is incoming_1, main.xsl:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="row">
        <xsl:if test="transaction_type = 'incoming_1'">
            <xsl:copy>
                <xsl:apply-templates/>
            </xsl:copy>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>

When I run this:

df = pd.read_csv('input.csv')

print(df.to_xml(stylesheet='main.xsl'))

I get:

<?xml version="1.0" encoding="UTF-8"?>
<data>
  <row>
    <index>0</index>
    <secondary_info_1>Some stuff goes here</secondary_info_1>
    <transactionnumber>trx001</transactionnumber>
    <date_transaction>2022-01-02</date_transaction>
    <amount_local>2883</amount_local>
    <from_country>DE</from_country>
    <to_country/>
    <transaction_type>incoming_1</transaction_type>
    <gender>M</gender>
  </row>
</data>
  • Related