I'm trying to generate an XML-file based on a dataframe. The XML output should be structured as having some general information, then a list of transactions, followed by some outtro comments. The tricky bit is that each transaction type needs a different output and parent tag, meaning I need to generate several templates that are executed conditionally. The data looks something like this:
secondary_info_1 | transactionnumber | date_transaction | amount_local | from_country | to_country | transaction_type | gender |
---|---|---|---|---|---|---|---|
Some stuff goes here | trx001 | 2022-01-02 | 2883 | DE | incoming_1 | M | |
Some stuff goes here | trx003 | 2022-01-04 | 857 | incoming_2 | M | ||
Some stuff goes here | trx004 | 2022-01-05 | 4 | AT | outgoing_1 | M | |
Some stuff goes here | trx006 | 2022-01-07 | 26 | outgoing_2 | M |
And the desired output looks something like this:
<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<record>
<secondary_info_1>Some stuff goes here</secondary_info_1>
<incoming_transaction_1>
<transaction_from>
<transactionnumber>trxno001</transactionnumber>
<date_transaction>2022-01-02</date_transaction>
<amount_local>2883</amount_local>
<from_country>DE</from_country>
</transaction_from>
<transaction_to>
<personalia>
<gender>M</gender>
</personalia>
</transaction_to>
</incoming_transaction_1>
<incoming_transaction_2>
<transaction_from>
<transactionnumber>trxno003</transactionnumber>
<date_transaction>2022-01-04</date_transaction>
<amount_local>857</amount_local>
</transaction_from>
<transaction_to>
<personalia>
<gender>M</gender>
</personalia>
</transaction_to>
</incoming_transaction_2>
<outgoing_transaction_1>
<transaction_to>
<personalia>
<gender>M</gender>
</personalia>
</transaction_to>
<transaction_from>
<transactionnumber>trxno004</transactionnumber>
<date_transaction>2022-01-05</date_transaction>
<amount_local>4</amount_local>
<from_country>AT</from_country>
</transaction_from>
</outgoing_transaction_1>
<outgoing_transaction_2>
<transaction_to>
<personalia>
<gender>M</gender>
</personalia>
</transaction_to>
<transaction_from>
<transactionnumber>trxno006</transactionnumber>
<date_transaction>2022-01-07</date_transaction>
<amount_local>26</amount_local>
</transaction_from>
</outgoing_transaction_2>
<secondary_info_2>Some more stuff goes here</secondary_info_2>
</record>
So in short we have a transaction overview where the personalia is attached to each transaction, and some information above and below these transactions (secondary_info_1/2
). The templates change based on the value in the transaction_type
column.
I find generating a more static XML works very well indeed by using the stylesheet argument in the pandas.to_xml()
function, however I struggle to create a conditionally triggered template. I tried modifying the approach taken here (cell 366 and 367, right before "XML Final Notes"), but I can't figure out how to use <xsl_if test="transaction_type == 'incoming_1'">
etc to trigger my different transaction templates conditionally.
Can someone help with this?
Edit: The general structure of the desired output is the following (context is also added as text in the tags):
<record>
<secondary_info>
<nested_secondary_info>A bunch of added information should go in this first section of the document</nested_secondary_info>
<incoming_transaction_1>
<incoming_nested_transaction_info_1>Some, but not all of the fields go here, and the structure changes based on the transaction type</incoming_nested_transaction_info_1>
</incoming_transaction_1>
<outgoing_transaction_1>
<outgoing_nested_transaction_info_2>For example, the "from_country" field should only be returned for incoming_transaction_1, and "to_country" should only be returned for "outgoing_transaction_1"</outgoing_nested_transaction_info_2>
</outgoing_transaction_1>
<incoming_transaction_2>
<incoming_nested_transaction_info_2>Ideally the transactions would also be sorted based on the date, but it seems that works as long as the dataframe is sorted correctly</incoming_nested_transaction_info_2>
</incoming_transaction_2>
<some_more_secondary_info>
<some_more_nested_secondary_info>I would also need to add some information to the end of this document as a sibling of the transactional information</some_more_nested_secondary_info>
</some_more_secondary_info>
</record>
Writing the specific templates will be a pain no matter how it's done, due to the specific requirements for the output. However I've been able to write a basic XSLT that allows me to produce the transactional overview:
<xsl:stylesheet version = "1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/data">
<report>
<xsl:apply-templates select="row"/>
</report>
</xsl:template>
<xsl:template match="row">
<xsl:choose>
<xsl:when test="transaction_type='incoming_1'">
<transaction_account_to_consumer>
<xsl:copy-of select="amount_local"/>
<t_from>
<xsl:copy-of select="from_country"/>
</t_from>
</transaction_account_to_consumer>
</xsl:when>
<xsl:when test="transaction_type='incoming_2'">
<transaction_entity_to_consumer>
<xsl:copy-of select="amount_local"/>
</transaction_entity_to_consumer>
</xsl:when>
<xsl:when test="transaction_type='outgoing_1'">
<transaction_consumer_to_account>
<t_to>
<xsl:copy-of select="to_country"/>
</t_to>
<xsl:copy-of select="amount_local"/>
</transaction_consumer_to_account>
</xsl:when>
<xsl:when test="transaction_type='outgoing_2'">
<transaction_consumer_to_entity>
<xsl:copy-of select="amount_local"/>
</transaction_consumer_to_entity>
</xsl:when>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
This template will balloon to several hundred lines (each transaction type sub-template requires >50 tags), which is annoying, but manageable. However, a bigger problem is that although I can get the conditional templating to work for my purposes, I cannot figure out how to add my secondary_info above and below the transactional information.
If I add this before xsl:choose, it just appends it above each transaction, whereas I would like this only to be appended once, before the list of transactions.
My idea to solve this was to match only the first row in a different template, and then use that template before <xsl:apply-templates select="row"/>, meaning the information at the top of the XSLT would look more like this:
<xsl:template match="/data">
<report>
<xsl:apply-templates select="first_row"/>
<xsl:apply-templates select="row"/>
<xsl:apply-templates select="first_row"/>
</report>
</xsl:template>
Where each template would only select certain fields from the first row of the dataframe. However, the two "select="first_row"" templates would select different fields from the first row of the transaction overview.
Is there a way I can achieve this?
CodePudding user response:
May be as simple as you have the test syntax slightly wrong, it's test="transaction_type = 'incoming_1'"
, with =
, not ==
.
Here's a simple XSLT to only keep <row>
if transaction_type is incoming_1, main.xsl:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="row">
<xsl:if test="transaction_type = 'incoming_1'">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
When I run this:
df = pd.read_csv('input.csv')
print(df.to_xml(stylesheet='main.xsl'))
I get:
<?xml version="1.0" encoding="UTF-8"?>
<data>
<row>
<index>0</index>
<secondary_info_1>Some stuff goes here</secondary_info_1>
<transactionnumber>trx001</transactionnumber>
<date_transaction>2022-01-02</date_transaction>
<amount_local>2883</amount_local>
<from_country>DE</from_country>
<to_country/>
<transaction_type>incoming_1</transaction_type>
<gender>M</gender>
</row>
</data>