Any ideas why this one is not working??
The XML that is being converted (much longer than this)
<XML>
<ClinicalData StudyOID="XXXXXXXXX" MetaDataVersionOID="53" mdsol_AuditSubCategoryName="QueryAnswer">
<SubjectData SubjectKey="XXXXXXXX-b7cd-4f97-8d25-594219de192f" mdsol_SubjectKeyType="SubjectUUID" mdsol_SubjectName="XX-002">
<SiteRef LocationOID="15" XXXX_StudyEnvSiteNumber="15" />
<StudyEventData StudyEventOID="DAY1" StudyEventRepeatKey="DAY1[1]" mdsol_InstanceId="47077">
<FormData FormOID="SS_DISP" FormRepeatKey="1" mdsol_DataPageId="320656">
<ItemGroupData ItemGroupOID="SS_DISP" mdsol_RecordId="797737">
<ItemData ItemOID="SS_DISP.DISPDAT" TransactionType="Upsert">
<AuditRecord>
<UserRef UserOID="[email protected]" />
<LocationRef LocationOID="15" mdsol_StudyEnvSiteNumber="15" />
<DateTimeStamp>2022-01-28T05:27:54</DateTimeStamp>
<ReasonForChange>
</ReasonForChange>
<SourceID>12345678</SourceID>
</AuditRecord>
<mdsol_Query QueryRepeatKey="123456" Value="Date of XXXX does not equal the XXXY Date. Please review and correct else clarify." Status="Answered" Response="Issues with XXXXX IWRS XXXXXX" />
</ItemData>
</ItemGroupData>
</FormData>
</StudyEventData>
</SubjectData>
</ClinicalData>
</XML>
I am using this python script to do the conversion, or I am trying to. I am pretty new to this.
from xml.etree import ElementTree
tree = ElementTree.parse('xml.xml')
root = tree.getroot()
data = []
for ClinicalData in root:
StudyOID = getattr(child.find('StudyOID'), 'text', None)
MetaDataVersionOID = getattr(child.find('MetaDataVersionOID'), 'text', None)
mdsol_AuditSubCategoryName = getattr(child.find('mdsol_AuditSubCategoryName'), 'text', None)
SubjectKey = getattr(child.find('SubjectKey'), 'text', None)
#print('{}, {}, {}, {}'.format(StudyOID, MetaDataVersionOID, mdsol_AuditSubCategoryName, SubjectKey))
data.append('{}, {}, {}, {}'.format(StudyOID, MetaDataVersionOID, mdsol_AuditSubCategoryName, SubjectKey))
#print (data)
with open('output.csv', 'w') as f: f.write('\n'.join([row for row in data[1:]]))
The error message I get is as follows:
File "<stdin>", line 9
with open('output.csv', 'w') as f: f.write('\n'.join([row for row in data[1:]]))
^^^^
SyntaxError: invalid syntax
CodePudding user response:
In the above, you have the data list ("data") convert that into a pandas dataframe as below and write to csv
cols = [StudyOID, MetaDataVersionOID, mdsol_AuditSubCategoryName, SubjectKey]
df = pd.DataFrame(data, columns=cols)
# Writing dataframe to csv
df.to_csv('output.csv')