Python convert CSV to XML with pandas groupby-CodePudding

I have a csv that I need to convert to XML using Python. I'm a novice python dev.

Example CSV data:

Amount,Code
CODE50,1246
CODE50,6290
CODE25,1077
CODE25,9790
CODE100,5319
CODE100,4988

Necessary output XML

<coupon-codes coupon-id="CODE50">
    <code>1246</code>
    <code>1246</code>
<coupon-codes/>
<coupon-codes coupon-id="CODE25">
    <code>1077</code>
    <code>9790</code>
<coupon-codes/>
<coupon-codes coupon-id="CODE100">
    <code>5319</code>
    <code>4988</code>
<coupon-codes/>

My guess is I have to use pandas to pull the csv in, use pandas.groupby to group the Amount column, then push this into an element/subelement to create the xml, then print/push to a xml file. I can't get the groupby to work and don't know how to then push that into the element, then populate the sub element.

CodePudding user response：

import xml.etree.ElementTree as ET
import pandas as pd

df = pd.read_csv('path/to/csvfile.csv')

root = ET.Element('root')
for coupon_id, df_group in df.groupby('Amount'):
    coupon_codes = ET.Element('coupon-codes', {'coupon-id': coupon_id})
    for coupon_code in df_group['Code']:
        code = ET.Element('code')
        code.text = str(coupon_code)
        coupon_codes.append(code)
    root.append(coupon_codes)

To convert it to a minified string, use this:

min_xml = ET.tostring(root, encoding='utf8')
print(min_xml)

Output:

<root><coupon-codes coupon-id="CODE100"><code>5319</code><code>4988</code></coupon-codes><coupon-codes coupon-id="CODE25"><code>1077</code><code>9790</code></coupon-codes><coupon-codes coupon-id="CODE50"><code>1246</code><code>6290</code></coupon-codes></root>

To convert it to a formatted string, use this:

ET.indent(root)
pretty_xml = ET.tostring(root, encoding='utf8')
print(pretty_xml)

Output:

<root>
  <coupon-codes coupon-id="CODE100">
    <code>5319</code>
    <code>4988</code>
  </coupon-codes>
  <coupon-codes coupon-id="CODE25">
    <code>1077</code>
    <code>9790</code>
  </coupon-codes>
  <coupon-codes coupon-id="CODE50">
    <code>1246</code>
    <code>6290</code>
  </coupon-codes>
</root>

CodePudding user response：

I can't comment yet, but is better if you put the code that you used for trying to solve the problem.

But the way there is a very similar question here, simple-csv-to-xml-conversion-python

And here is a post for grouping by, group by/cumcount

With this information i achieved it, by doing this

df = pd.DataFrame(['CODE50',
    'CODE50',
    'CODE25',
    'CODE25',
    'CODE100',
    'CODE100'],columns=['code'])

df['amount'] = [1246,
    6290,
    1077,
    9790,
    5319,
    4988]

df['columns'] = df.groupby('code')['amount'].cumcount()
result = df.pivot(index='code', columns='columns')

def convert(df):
    str = ''
    for a in df.index:
        str  = '\n'f"""<coupon-codes coupon-id="{a}">"""
        for b in df['amount'].loc[a]:
            str  = '\n''       'f"<code>{b}</code>"
    return str
xml = convert(result)
print(xml)

result

<coupon-codes coupon-id="CODE100">
       <code>5319</code>
       <code>4988</code>
<coupon-codes coupon-id="CODE25">
       <code>1077</code>
       <code>9790</code>
<coupon-codes coupon-id="CODE50">
       <code>1246</code>
       <code>6290</code>

CodePudding user response：

why not use df.to_xml()? This simple api will generate xml from the dataframe

CodePudding user response：

ElementTree can do it without a trouble. See below

from collections import defaultdict
from xml.etree.ElementTree import Element, SubElement,ElementTree

data = defaultdict(list)

with open('in.txt') as f:
  next(f)
  for line in f:
    parts = line.split(',')
    data[parts[0]].append(parts[1].strip())

root = Element('root')
for k,v in data.items():
  sub = SubElement(root,'coupon-codes',attrib={'coupon-id':k})
  for vv in v:
    sub_v = SubElement(sub,'code')
    sub_v.text = vv

tree = ElementTree(root)
with open('out.xml', 'w') as f:
  tree.write(f, encoding='unicode')