I'm a professional indexer new to Ruby and nokogiri and I am in need of some assistance.
I'm working on a set of macros that will allow me to take an XML file, output from my indexing software, and parse it into valid \index{}
commands for inclusion in a LaTeX source file. Each XML <record>
contains at least two <field>
tags, so I will have to iterate over the multiple <field>
tags to build my \index{}
entry.
The following is an example of an index record from the xml file.
<record time="2022-08-27T17:25:12" id="30">
<field><text style="i"/><hide>SS </hide>Titanic<text/></field>
<field>passengers</field>
<field ><text style="b"/>5<text/></field>
</record>
I will produce intermediate output of this record in the form of:
\index{Titanic@\textit{SS Titanic}!passengers|textbf} 5
(The numeric locator is used to place the \index{}
entry at the correct spot in the LaTex file and won't be included in the LaTeX source file)
I am using nokogiri to manipulate the xml file and have been able to reach the point where I return a nodelist that contains just the <field>
tags for each <record>
, but I need to be able to retrieve all the text in the <field>
, including the formatting information (if I use the text
method on a <field>
, it returns "SS Titanic" for example, with all formatting information stripped away).
I'm stuck on how to access the entire text string in the <field>
tag. Once I can get that, I have a good idea of how to structure my parser.
Any help will be greatly appreciated.
CodePudding user response:
does this help?
xml = "<record time="2022-08-27T17:25:12" id="30">
<field><text style="i"/><hide>SS </hide>Titanic<text/></field>
<field>passengers</field>
<field ><text style="b"/>5<text/></field>
</record>"
fields = Nokogiri::XML(xml).xpath(".//field")
puts fields.first.text #=> "SS Titanic"
puts fields.map(&:text) #=> ["SS Titanic", "passengers", "5"]