Home > Enterprise >  How can I create a "quasi csv" from my XML?
How can I create a "quasi csv" from my XML?

Time:03-29

I am attempting to get all <ref type="biblical"/> in TEI-XML that looks more or less like this (header from project, one specific paragraph with a <ref type="biblical" cRef=""> included:

<?xml version="1.0" encoding="utf-8"?>

<!--<?xml-model href="../Customization/Schema_Quellentexte/religionsfrieden-quellentexte.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>--> <?xml-model href=""http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="edoc_ed000227_fg_augsburger_interim_zdt">
   
   
   <teiHeader>      
      <fileDesc>         
         <titleStmt>
            <title level="s">Europäische Religionsfrieden Digital</title>
            <title level="a">Augsburger Interim (15.&#x00A0;Mai&#x00A0;/ 30.&#x00A0;Juni 1548) - Deutscher Text</title>
            <principal ref="http://d-nb.info/gnd/111186870">Irene Dingel</principal> 
            <editor role="http://id.loc.gov/vocabulary/relators/edt" ref="http://orcid.org/0000-0002-1509-6960">Thomas Stäcker</editor>
            <editor role="http://id.loc.gov/vocabulary/relators/edt" ref="https://orcid.org/0000-0002-0801-5130">Andreas Zecherle</editor> 
            <editor role="http://id.loc.gov/vocabulary/relators/mrk">Silke Kalmer</editor>           
         </titleStmt>         
         <editionStmt>
            <edition>Digitale Edition gemäß <ref target="http://www.tei-c.org/">TEI P5</ref></edition>
            <funder>Deutsche Forschungsgemeinschaft</funder>
         </editionStmt>         
         <publicationStmt>
            <publisher>
               <orgName ref="http://www.isni.org/0000000119314040">Akademie der Wissenschaften und Literatur Mainz</orgName>
            </publisher>           
            <distributor change="#ch01">
               <orgName ref="http://www.isni.org/0000000110101946">Universitäts- und Landesbibliothek Darmstadt</orgName>
               <idno type="ISIL">http://lobid.org/organisation/DE-17</idno>
            </distributor>
            <date when="2021-12-16" type="issued">2021</date>
            <availability>
               <licence target="https://creativecommons.org/licenses/by/4.0/">
                  <p xml:lang="en">This file is licensed under the terms of the Creative Commons License CC-BY 4.0 (Attribution 4.0 International)</p>
               </licence>
            </availability>
         </publicationStmt>        
         <sourceDesc><!-- bitte ausfüllen; nicht zutreffende Elemente bitte rauslöschen -->
            <msDesc>
               <msIdentifier>
                  <settlement></settlement>                               
                  <repository></repository>
                  <collection></collection>
                  <idno>Signatur</idno>
                  <idno type="urn">urn:nbn:de:gbv:3:1-254930</idno>
                  <idno type="vd16">http://gateway-bayern.de/VD16 ZV 17728</idno>
               </msIdentifier>
               <msContents>
                  <msItem>
                     <title></title>
                     <respStmt>
                        <resp></resp>
                        <persName></persName>
                     </respStmt>
                     <biblStruct><!-- gilt nur für Drucke; rausnehmen, wenn es sich um ein Manuskript handelt! -->
                        <monogr xml:id="vorlage">
                           <imprint>
                              <pubPlace ref="http://www.geonames.org/2874225/mainz.html">Mainz</pubPlace>
                              <publisher>
                                 <persName></persName>
                              </publisher>
                              <date>1548</date>
                           </imprint>
                        </monogr>
                     </biblStruct>
                  </msItem>
               </msContents>
               <history><!-- bei Manuskript, falls Datum und/oder Ort bekannt sind; andernfalls rauslöschen! -->
                  <origin>
                     <date></date>
                     <placeName></placeName>
                  </origin>
               </history>
            </msDesc>    
         </sourceDesc>
      </fileDesc>      
      <encodingDesc>         
         <projectDesc>
            <p>
               <ref target="http://www.eured.de">Europäische Religionsfrieden Digital</ref>
            </p>
         </projectDesc>
         <classDecl>
            <taxonomy xml:id="marcrelator">
               <bibl><idno type="URI">http://id.loc.gov/vocabulary/relators/</idno> MARC Code List for Relators </bibl>
            </taxonomy>
         </classDecl>         
      </encodingDesc>      
      <profileDesc>         
         <langUsage>
            <language ident="deut">Deutsch</language><!-- Sprache hinzufügen -->
         </langUsage>         
      </profileDesc>      
      <revisionDesc>
         <change xml:id="ch01" when="2018-01-18"><p>Die Edition ist 2018 von der Herzog August Bibliothek Wolfenbüttel an die Universitäts- und Landesbibliothek Darmstadt umgezogen.</p></change>
      </revisionDesc>   
   </teiHeader>
<text>
<body>

            <p facs="#facs_13_TextRegion_1624022922087_339">
               <lb facs="#facs_13_line_1624022900872_333" n="N001"/>Da nun das<note type="crit_app"><bibl><ref type="bibl" target="#mehlhausen_augsburger_interim"><surname type="editor">Mehlhausen</surname>, Augsburger Interim</ref>, S.&#x00A0;36</bibl>: des.</note> menschen gemüt dermassen wol <w>zuge<pc>-</pc>
               <lb facs="#facs_13_r2l14" n="N002"/><note place="margin-left" facs="#facs_13_TextRegion_1624023026742_369">
               <lb facs="#facs_13_r1l2" n="N001"/>Eccle. 15.</note>richtet</w> was, <q>hat in Gott gelassen inn der hand seines <w>eig<pc>-</pc>
               <lb facs="#facs_13_r2l15" n="N003"/>nen</w> Raths</q><note type="annotation"><ref type="biblical" cRef="Sir_15,14">Sir 15,14</ref>.</note>, Also weyt, das er nicht weniger macht hette
               <lb facs="#facs_13_r2l16" n="N004"/>zu wölen das gut als das böse. </p>
</body>
</text>
</TEI>

Originally, I wanted to just create an HTML table from this in the first step, I took a step back and decided to do sth else first, as I want this to be in a CSV at the end, I thought, a good step towards that would be to just create a list with all elements. (not a structured list but a one-after-another in a text-file :-D):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:xs="http://www.w3.org/2001/XMLSchema"
   xmlns:tei ="http:www.tei-c.org/ns/1.0"
   exclude-result-prefixes="xs"
   version="3.0">
   <xsl:strip-space elements="*"/>
   <xsl:template match="/tei:ref[@type='biblical']">
      <!-- header -->
      <xsl:text> Kürzel&#10;</xsl:text>
      <!-- data -->
      <xsl:for-each select="tei:ref[@type='biblical']">

         <xsl:value-of select="@cRef"/>
         <xsl:text>&#10;</xsl:text>
      </xsl:for-each>
   </xsl:template>
   
</xsl:stylesheet>

I am a little baffled by the result:

The source document is in namespace http://www.tei-c.org/ns/1.0, but none of the template rules match elements in this namespace (Use --suppressXsltNamespaceCheck:on to avoid this warning)

I am using Saxon-EE 9.8.0.8 via Oxygen 20.0. I am still looking for a way to make this a table, but I can also use Python for this. (I am way more comfortable with Python!)

expected output is the part from the all @cRefattributes with the content for each @cRef on one line (the full result should contain roughly 300 entries):

Kürzel;
Sir_15,14;
{next entry}

...

I am at the moment recovering from Covid and thus having a hard time to concentrate, so be mild if I overlooked something too mundane! :-D

all the best, K

CodePudding user response:

Here's a way you could do this :

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:xs="http://www.w3.org/2001/XMLSchema"
   xmlns:tei ="http://www.tei-c.org/ns/1.0"
   exclude-result-prefixes="xs"
   version="3.0">
   <xsl:strip-space elements="*"/>
   
   <xsl:output method="text"/>
   
   <xsl:template match="/">
      <!-- header -->
      <xsl:text>Kürzel&#10;</xsl:text>
      <!-- data -->
      <xsl:apply-templates select="//tei:ref[@type='biblical']"/>
   </xsl:template>
   
   <xsl:template match="tei:ref[@type='biblical']">
      <xsl:value-of select="@cRef"/>
      <xsl:text>&#10;</xsl:text>
   </xsl:template>
   
</xsl:stylesheet>

See it working here : https://xsltfiddle.liberty-development.net/bF2MmYn

CodePudding user response:

If you're using a processor that supports XSLT 2.0 (such as Saxon 9.x) then you can do simply:

XSLT 2.0

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.tei-c.org/ns/1.0">
<xsl:output method="text"/>
   
<xsl:template match="/TEI">
    <xsl:text>Kürzel;&#10;</xsl:text>
    <xsl:value-of select="//ref[@type='biblical']/@cRef" separator=";&#10;"/>
</xsl:template>
  
</xsl:stylesheet>
  • Related