Home > OS >  I am converting xml to csv for a client project & cannot get the conversion to work. I am using Pyth
I am converting xml to csv for a client project & cannot get the conversion to work. I am using Pyth

Time:10-08

Any ideas why this one is not working??

The XML that is being converted (much longer than this)

    <XML>
      <ClinicalData StudyOID="XXXXXXXXX" MetaDataVersionOID="53" mdsol_AuditSubCategoryName="QueryAnswer">
        <SubjectData SubjectKey="XXXXXXXX-b7cd-4f97-8d25-594219de192f" mdsol_SubjectKeyType="SubjectUUID" mdsol_SubjectName="XX-002">
          <SiteRef LocationOID="15" XXXX_StudyEnvSiteNumber="15" />
          <StudyEventData StudyEventOID="DAY1" StudyEventRepeatKey="DAY1[1]" mdsol_InstanceId="47077">
            <FormData FormOID="SS_DISP" FormRepeatKey="1" mdsol_DataPageId="320656">
              <ItemGroupData ItemGroupOID="SS_DISP" mdsol_RecordId="797737">
                <ItemData ItemOID="SS_DISP.DISPDAT" TransactionType="Upsert">
                  <AuditRecord>
                    <UserRef UserOID="[email protected]" />
                    <LocationRef LocationOID="15" mdsol_StudyEnvSiteNumber="15" />
                    <DateTimeStamp>2022-01-28T05:27:54</DateTimeStamp>
                    <ReasonForChange>
                    </ReasonForChange>
                    <SourceID>12345678</SourceID>
                  </AuditRecord>
                  <mdsol_Query QueryRepeatKey="123456" Value="Date of XXXX does not equal the XXXY Date. Please review and correct else clarify." Status="Answered" Response="Issues with XXXXX IWRS XXXXXX" />
                </ItemData>
              </ItemGroupData>
            </FormData>
          </StudyEventData>
        </SubjectData>
      </ClinicalData>
    </XML>

I am using this python script to do the conversion, or I am trying to. I am pretty new to this.

    from xml.etree import ElementTree
    tree = ElementTree.parse('xml.xml')
    root = tree.getroot()
    data = []
    for ClinicalData in root:
     StudyOID = getattr(child.find('StudyOID'), 'text', None)
     MetaDataVersionOID = getattr(child.find('MetaDataVersionOID'), 'text', None)
     mdsol_AuditSubCategoryName = getattr(child.find('mdsol_AuditSubCategoryName'), 'text', None)
     SubjectKey = getattr(child.find('SubjectKey'), 'text', None)
     #print('{}, {}, {}, {}'.format(StudyOID, MetaDataVersionOID, mdsol_AuditSubCategoryName, SubjectKey))
     data.append('{}, {}, {}, {}'.format(StudyOID, MetaDataVersionOID, mdsol_AuditSubCategoryName, SubjectKey))
    #print (data)
    with open('output.csv', 'w') as f: f.write('\n'.join([row for row in data[1:]]))

The error message I get is as follows:

    File "<stdin>", line 9
    with open('output.csv', 'w') as f: f.write('\n'.join([row for row in data[1:]]))
    ^^^^

    SyntaxError: invalid syntax

CodePudding user response:

In the above, you have the data list ("data") convert that into a pandas dataframe as below and write to csv

cols = [StudyOID, MetaDataVersionOID, mdsol_AuditSubCategoryName, SubjectKey]
df = pd.DataFrame(data, columns=cols)  
# Writing dataframe to csv
df.to_csv('output.csv')
  • Related