I have a csv that I need to convert to XML using Python. I'm a novice python dev.
Example CSV data:
Amount,Code
CODE50,1246
CODE50,6290
CODE25,1077
CODE25,9790
CODE100,5319
CODE100,4988
Necessary output XML
<coupon-codes coupon-id="CODE50">
<code>1246</code>
<code>1246</code>
<coupon-codes/>
<coupon-codes coupon-id="CODE25">
<code>1077</code>
<code>9790</code>
<coupon-codes/>
<coupon-codes coupon-id="CODE100">
<code>5319</code>
<code>4988</code>
<coupon-codes/>
My guess is I have to use pandas to pull the csv in, use pandas.groupby
to group the Amount
column, then push this into an element/subelement to create the xml, then print/push to a xml file. I can't get the groupby
to work and don't know how to then push that into the element, then populate the sub element.
CodePudding user response:
import xml.etree.ElementTree as ET
import pandas as pd
df = pd.read_csv('path/to/csvfile.csv')
root = ET.Element('root')
for coupon_id, df_group in df.groupby('Amount'):
coupon_codes = ET.Element('coupon-codes', {'coupon-id': coupon_id})
for coupon_code in df_group['Code']:
code = ET.Element('code')
code.text = str(coupon_code)
coupon_codes.append(code)
root.append(coupon_codes)
To convert it to a minified string, use this:
min_xml = ET.tostring(root, encoding='utf8')
print(min_xml)
Output:
<root><coupon-codes coupon-id="CODE100"><code>5319</code><code>4988</code></coupon-codes><coupon-codes coupon-id="CODE25"><code>1077</code><code>9790</code></coupon-codes><coupon-codes coupon-id="CODE50"><code>1246</code><code>6290</code></coupon-codes></root>
To convert it to a formatted string, use this:
ET.indent(root)
pretty_xml = ET.tostring(root, encoding='utf8')
print(pretty_xml)
Output:
<root>
<coupon-codes coupon-id="CODE100">
<code>5319</code>
<code>4988</code>
</coupon-codes>
<coupon-codes coupon-id="CODE25">
<code>1077</code>
<code>9790</code>
</coupon-codes>
<coupon-codes coupon-id="CODE50">
<code>1246</code>
<code>6290</code>
</coupon-codes>
</root>
CodePudding user response:
I can't comment yet, but is better if you put the code that you used for trying to solve the problem.
But the way there is a very similar question here, simple-csv-to-xml-conversion-python
And here is a post for grouping by, group by/cumcount
With this information i achieved it, by doing this
df = pd.DataFrame(['CODE50',
'CODE50',
'CODE25',
'CODE25',
'CODE100',
'CODE100'],columns=['code'])
df['amount'] = [1246,
6290,
1077,
9790,
5319,
4988]
df['columns'] = df.groupby('code')['amount'].cumcount()
result = df.pivot(index='code', columns='columns')
def convert(df):
str = ''
for a in df.index:
str = '\n'f"""<coupon-codes coupon-id="{a}">"""
for b in df['amount'].loc[a]:
str = '\n'' 'f"<code>{b}</code>"
return str
xml = convert(result)
print(xml)
result
<coupon-codes coupon-id="CODE100">
<code>5319</code>
<code>4988</code>
<coupon-codes coupon-id="CODE25">
<code>1077</code>
<code>9790</code>
<coupon-codes coupon-id="CODE50">
<code>1246</code>
<code>6290</code>
CodePudding user response:
why not use df.to_xml()? This simple api will generate xml from the dataframe
CodePudding user response:
ElementTree can do it without a trouble. See below
from collections import defaultdict
from xml.etree.ElementTree import Element, SubElement,ElementTree
data = defaultdict(list)
with open('in.txt') as f:
next(f)
for line in f:
parts = line.split(',')
data[parts[0]].append(parts[1].strip())
root = Element('root')
for k,v in data.items():
sub = SubElement(root,'coupon-codes',attrib={'coupon-id':k})
for vv in v:
sub_v = SubElement(sub,'code')
sub_v.text = vv
tree = ElementTree(root)
with open('out.xml', 'w') as f:
tree.write(f, encoding='unicode')