I have this (below) xml file which I would like to import into pandas using pdx.read_xml(). I tried at first to find the root with this code:
import xml.etree.ElementTree as ET
tree = ET.parse('DOCDB-202141-Amend-PubDate20211005AndBefore-EP-0002.xml')
root = tree.getroot()
and then use the root like this:
df = pdx.read_xml('myfile.xml', [root.tag, "Element '{http://www.epo.org/exchange}exchange-documents' at 0x7fe0f85d6770"])
but it does not seem to work (Keyerror). How can I read the xml file? Unfortunately I cannot provide you with further information as I don't know yet the exact content of the xml file. I only know that it is a dataframe from PATSTAT that should be put in pandas. In case you need more info just let me know.
Thank you!
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- edited with XMLSpy v2018 rel. 2 (x64) (http://www.altova.com) by (EPO / Europäisches Patentamt München) -->
<!DOCTYPE exch:exchange-documents SYSTEM "docdb-entities.dtd">
<exch:exchange-documents xmlns:exch="http://www.epo.org/exchange" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.epo.org/exchange exchange-documents-v2.5.8.xsd" date-of-exchange="20211011" dtd-version="2.5.8" file="DOCDB-202141-Amend-PubDate20211005AndBefore-EP-0001" no-of-documents="0000032" originating-office="EP">
<exch:exchange-document country="EP" doc-number="2224545" kind="C0" doc-id="470936316" date-publ="20170622" family-id="42184427" is-representative="NO" date-of-last-exchange="20211011" date-of-previous-exchange="20210126" date-added-docdb="20170630" originating-office="EP" status="A">
<exch:bibliographic-data>
<exch:publication-reference data-format="docdb">
<document-id lang="en">
<country>EP</country>
<doc-number>2224545</doc-number>
<kind>C0</kind>
<date>20170622</date>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc">
<document-id lang="en">
<doc-number>EP2224545</doc-number>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="original">
<document-id>
<doc-number>2224545</doc-number>
</document-id>
</exch:publication-reference>
<exch:classifications-ipcr>
<classification-ipcr sequence="1">
<text>H01R 13/514 20060101AFI20151116BHEP </text>
</classification-ipcr>
<classification-ipcr sequence="2">
<text>H01R 13/46 20060101ALI20151116BHEP </text>
</classification-ipcr>
<classification-ipcr sequence="3">
<text>H01R 13/66 20060101ALI20151116BHEP </text>
</classification-ipcr>
<classification-ipcr sequence="4">
<text>H01R 24/58 20110101ALI20110101RMEP </text>
</classification-ipcr>
<classification-ipcr sequence="5">
<text>H01R 24/64 20110101ALI20151116BHEP </text>
</classification-ipcr>
</exch:classifications-ipcr>
<exch:patent-classifications>
<patent-classification sequence="1">
<classification-scheme office="EP" scheme="CPCI">
<date>20130101</date>
</classification-scheme>
<classification-symbol>H01R 13/465 </classification-symbol>
<symbol-position>L</symbol-position>
<classification-value>A</classification-value>
<classification-status>B</classification-status>
<classification-data-source>H</classification-data-source>
<generating-office>EP</generating-office>
<action-date>
<date>20130101</date>
</action-date>
</patent-classification>
<patent-classification sequence="2">
<classification-scheme office="EP" scheme="CPCI">
<date>20130101</date>
</classification-scheme>
<classification-symbol>H01R 13/514 </classification-symbol>
<symbol-position>F</symbol-position>
<classification-value>I</classification-value>
<classification-status>B</classification-status>
<classification-data-source>H</classification-data-source>
<generating-office>EP</generating-office>
<action-date>
<date>20130101</date>
</action-date>
</patent-classification>
<patent-classification sequence="3">
<classification-scheme office="EP" scheme="CPCI">
<date>20130101</date>
</classification-scheme>
<classification-symbol>H01R 13/6658 </classification-symbol>
<symbol-position>L</symbol-position>
<classification-value>I</classification-value>
<classification-status>B</classification-status>
<classification-data-source>H</classification-data-source>
<generating-office>EP</generating-office>
<action-date>
<date>20130101</date>
</action-date>
</patent-classification>
<patent-classification sequence="4">
<classification-scheme office="EP" scheme="CPCI">
<date>20130101</date>
</classification-scheme>
<classification-symbol>H01R 24/64 </classification-symbol>
<symbol-position>L</symbol-position>
<classification-value>A</classification-value>
<classification-status>B</classification-status>
<classification-data-source>H</classification-data-source>
<generating-office>EP</generating-office>
<action-date>
<date>20130819</date>
</action-date>
</patent-classification>
</exch:patent-classifications>
<exch:application-reference is-representative="NO" doc-id="274384603" data-format="docdb">
<document-id>
<country>EP</country>
<doc-number>10152335</doc-number>
<kind>A</kind>
<date>20100201</date>
</document-id>
</exch:application-reference>
<exch:application-reference data-format="epodoc">
<document-id>
<doc-number>EP20100152335</doc-number>
</document-id>
</exch:application-reference>
<exch:application-reference data-format="original">
<document-id>
<doc-number>10152335.5</doc-number>
</document-id>
</exch:application-reference>
<exch:language-of-publication>en</exch:language-of-publication>
<exch:priority-claims>
<exch:priority-claim sequence="1" data-format="docdb">
<document-id doc-id="322475065">
<country>US</country>
<doc-number>39481609</doc-number>
<kind>A</kind>
<date>20090227</date>
</document-id>
<exch:priority-active-indicator>Y</exch:priority-active-indicator>
</exch:priority-claim>
<exch:priority-claim sequence="1" data-format="epodoc">
<document-id>
<doc-number>US20090394816</doc-number>
</document-id>
</exch:priority-claim>
<exch:priority-claim sequence="1" data-format="original">
<document-id>
<doc-number>394816</doc-number>
</document-id>
</exch:priority-claim>
</exch:priority-claims>
<exch:parties>
<exch:applicants>
<exch:applicant sequence="1" data-format="docdb">
<exch:applicant-name>
<name>TYCO ELECTRONICS CORP</name>
</exch:applicant-name>
<residence>
<country>US</country>
</residence>
</exch:applicant>
<exch:applicant sequence="1" data-format="docdba">
<exch:applicant-name>
<name>Tyco Electronics Corporation</name>
</exch:applicant-name>
</exch:applicant>
</exch:applicants>
<exch:inventors>
<exch:inventor sequence="1" data-format="docdb">
<exch:inventor-name>
<name>PEPE PAUL JOHN</name>
</exch:inventor-name>
<residence>
<country>US</country>
</residence>
</exch:inventor>
<exch:inventor sequence="2" data-format="docdb">
<exch:inventor-name>
<name>MUIR SHELDON EASTON</name>
</exch:inventor-name>
<residence>
<country>US</country>
</residence>
</exch:inventor>
<exch:inventor sequence="1" data-format="docdba">
<exch:inventor-name>
<name>Pepe, Paul John</name>
</exch:inventor-name>
</exch:inventor>
<exch:inventor sequence="2" data-format="docdba">
<exch:inventor-name>
<name>Muir, Sheldon Easton</name>
</exch:inventor-name>
</exch:inventor>
</exch:inventors>
</exch:parties>
<exch:designation-of-states>
<exch:designation-epc>
<exch:contracting-states>
<country>AT</country>
<country>BE</country>
<country>BG</country>
<country>CH</country>
<country>CY</country>
<country>CZ</country>
<country>DE</country>
<country>DK</country>
<country>EE</country>
<country>ES</country>
<country>FI</country>
<country>FR</country>
<country>GB</country>
<country>GR</country>
<country>HR</country>
<country>HU</country>
<country>IE</country>
<country>IS</country>
<country>IT</country>
<country>LI</country>
<country>LT</country>
<country>LU</country>
<country>LV</country>
<country>MC</country>
<country>MK</country>
<country>MT</country>
<country>NL</country>
<country>NO</country>
<country>PL</country>
<country>PT</country>
<country>RO</country>
<country>SE</country>
<country>SI</country>
<country>SK</country>
<country>SM</country>
<country>TR</country>
</exch:contracting-states>
<exch:up-participating-states>
<country>IT</country>
</exch:up-participating-states>
<exch:validation-states>
<country>new</country>
</exch:validation-states>
</exch:designation-epc>
</exch:designation-of-states>
<exch:invention-title lang="de" data-format="docdba">Kassette für ein Kabelverbindungssystem</exch:invention-title>
<exch:invention-title lang="en" data-format="docdba">Cassette for a cable interconnect system</exch:invention-title>
<exch:invention-title lang="fr" data-format="docdba">Cassette pour système d'interconnexion de câbles</exch:invention-title>
<exch:dates-of-public-availability>
<exch:printed-with-grant>
<document-id>
<date>20170622</date>
</document-id>
</exch:printed-with-grant>
</exch:dates-of-public-availability>
</exch:bibliographic-data>
<exch:patent-family>
<exch:family-member>
<exch:application-reference data-format="docdb" is-representative="NO">
<document-id>
<country>EP</country>
<doc-number>10152335</doc-number>
<kind>A</kind>
</document-id>
</exch:application-reference>
<exch:application-reference data-format="epodoc">
<document-id>
<doc-number>EP20100152335</doc-number>
</document-id>
</exch:application-reference>
<exch:publication-reference data-format="docdb" sequence="1">
<document-id>
<country>EP</country>
<doc-number>2224545</doc-number>
<kind>A1</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="1">
<document-id>
<doc-number>EP2224545</doc-number>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="docdb" sequence="2">
<document-id>
<country>EP</country>
<doc-number>2224545</doc-number>
<kind>B1</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="2">
<document-id>
<doc-number>EP2224545</doc-number>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="docdb" sequence="3">
<document-id>
<country>EP</country>
<doc-number>2224545</doc-number>
<kind>C0</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="3">
<document-id>
<doc-number>EP2224545</doc-number>
</document-id>
</exch:publication-reference>
</exch:family-member>
<exch:family-member>
<exch:application-reference data-format="docdb" is-representative="NO">
<document-id>
<country>JP</country>
<doc-number>2010028634</doc-number>
<kind>A</kind>
</document-id>
</exch:application-reference>
<exch:application-reference data-format="epodoc">
<document-id>
<doc-number>JP20100028634</doc-number>
</document-id>
</exch:application-reference>
<exch:publication-reference data-format="docdb" sequence="1">
<document-id>
<country>JP</country>
<doc-number>2010205724</doc-number>
<kind>A</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="1">
<document-id>
<doc-number>JP2010205724</doc-number>
</document-id>
</exch:publication-reference>
</exch:family-member>
<exch:family-member>
<exch:application-reference data-format="docdb" is-representative="NO">
<document-id>
<country>TW</country>
<doc-number>99102542</doc-number>
<kind>A</kind>
</document-id>
</exch:application-reference>
<exch:application-reference data-format="epodoc">
<document-id>
<doc-number>TW20100102542</doc-number>
</document-id>
</exch:application-reference>
<exch:publication-reference data-format="docdb" sequence="1">
<document-id>
<country>TW</country>
<doc-number>I497834</doc-number>
<kind>B</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="1">
<document-id>
<doc-number>TWI497834B</doc-number>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="docdb" sequence="2">
<document-id>
<country>TW</country>
<doc-number>201037898</doc-number>
<kind>A</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="2">
<document-id>
<doc-number>TW201037898</doc-number>
</document-id>
</exch:publication-reference>
</exch:family-member>
<exch:family-member>
<exch:application-reference data-format="docdb" is-representative="YES">
<document-id>
<country>US</country>
<doc-number>39481609</doc-number>
<kind>A</kind>
</document-id>
</exch:application-reference>
<exch:application-reference data-format="epodoc">
<document-id>
<doc-number>US20090394816</doc-number>
</document-id>
</exch:application-reference>
<exch:publication-reference data-format="docdb" sequence="1">
<document-id>
<country>US</country>
<doc-number>7909643</doc-number>
<kind>B2</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="1">
<document-id>
<doc-number>US7909643</doc-number>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="docdb" sequence="2">
<document-id>
<country>US</country>
<doc-number>2010221931</doc-number>
<kind>A1</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="2">
<document-id>
<doc-number>US2010221931</doc-number>
</document-id>
</exch:publication-reference>
</exch:family-member>
<exch:abstract lang="en" country="EP" doc-number="2224545" kind="A1" data-format="docdba" abstract-source="EPO">
<exch:p>A cassette (20) includes a housing (30) having a front (34) and a rear (36). The housing (30) has a plurality of plug cavities (42) open at the front (34) for receiving plugs therein, and the housing (30) has a rear chamber (102) open to the plug cavities. The cassette also includes a contact subassembly (100) having a circuit board (104) and a plurality of contacts (144) arranged in contact sets (146) coupled to the circuit board (104). Each contact set (146) is configured to mate with a corresponding plug, where the contact subassembly (100) is loaded into the rear chamber (102) such that the contact sets are received in different corresponding plug cavities (42). The circuit board (104) is oriented generally parallel to the front of the housing (30) when the contact subassembly (100) is loaded into the rear chamber (102).</exch:p>
</exch:abstract>
</exch:patent-family>
</exch:exchange-document>
<exch:exchange-document country="EP" doc-number="2228299" kind="C0" doc-id="470936320" date-publ="20170622" family-id="42229294" is-representative="NO" date-of-last-exchange="20211011" date-of-previous-exchange="20210126" date-added-docdb="20170630" originating-office="EP" status="A">
<exch:bibliographic-data>
<exch:publication-reference data-format="docdb">
<document-id lang="de">
<country>EP</country>
<doc-number>2228299</doc-number>
<kind>C0</kind>
<date>20170622</date>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc">
...
</exch:applicants>
<exch:inventors>
<exch:inventor sequence="1" data-format="docdb">
<exch:inventor-name>
<name>SCHINDLER RODNEY A</name>
</exch:inventor-name>
<residence>
<country>US</country>
</residence>
</exch:inventor>
<exch:inventor sequence="2" data-format="docdb">
<exch:inventor-name>
<name>SHEIDLER ALAN D</name>
</exch:inventor-name>
<residence>
<country>US</country>
</residence>
</exch:inventor>
<exch:inventor sequence="3" data-format="docdb">
<exch:inventor-name>
<name>SCHMITZ JOSEPH P</name>
</exch:inventor-name>
<residence>
<country>US</country>
</residence>
</exch:inventor>
<exch:inventor sequence="1" data-format="docdba">
<exch:inventor-name>
<name>Schindler, Rodney A.</name>
</exch:inventor-name>
</exch:inventor>
<exch:inventor sequence="2" data-format="docdba">
<exch:inventor-name>
<name>Sheidler, Alan D.</name>
</exch:inventor-name>
</exch:inventor>
<exch:inventor sequence="3" data-format="docdba">
<exch:inventor-name>
<name>Schmitz, Joseph P.</name>
</exch:inventor-name>
</exch:inventor>
</exch:inventors>
</exch:parties>
<exch:designation-of-states>
<exch:designation-epc>
<exch:contracting-states>
<country>AT</country>
<country>BE</country>
<country>BG</country>
<country>CH</country>
<country>CY</country>
<country>CZ</country>
<country>DE</country>
<country>DK</country>
<country>EE</country>
<country>ES</country>
<country>FI</country>
<country>FR</country>
<country>GB</country>
<country>GR</country>
<country>HR</country>
<country>HU</country>
<country>IE</country>
<country>IS</country>
<country>IT</country>
<country>LI</country>
<country>LT</country>
<country>LU</country>
<country>LV</country>
<country>MC</country>
<country>MK</country>
<country>MT</country>
<country>NL</country>
<country>NO</country>
<country>PL</country>
<country>PT</country>
<country>RO</country>
<country>SE</country>
<country>SI</country>
<country>SK</country>
<country>SM</country>
<country>TR</country>
</exch:contracting-states>
<exch:up-participating-states>
<country>NL</country>
</exch:up-participating-states>
<exch:validation-states>
<country>PL</country>
</exch:validation-states>
</exch:designation-epc>
</exch:designation-of-states>
<exch:invention-title lang="de" data-format="docdba">Motor für eine landwirtschaftliche Erntemaschine mit isochroner Drehmomentkurve mit Überleistung</exch:invention-title>
<exch:invention-title lang="en" data-format="docdba">Engine for an agricultural harvester having isochronous torque curve with power bulge</exch:invention-title>
<exch:invention-title lang="fr" data-format="docdba">Moteur pour moissonneuse agricole doté d'une courbe de couple isochrone avec gonflement d'alimentation</exch:invention-title>
<exch:dates-of-public-availability>
<exch:printed-with-grant>
<document-id>
<date>20170622</date>
</document-id>
</exch:printed-with-grant>
</exch:dates-of-public-availability>
</exch:bibliographic-data>
<exch:patent-family>
<exch:family-member>
<exch:application-reference data-format="docdb" is-representative="NO">
<document-id>
<country>AU</country>
<doc-number>2010201937</doc-number>
<kind>A</kind>
</document-id>
</exch:application-reference>
<exch:application-reference data-format="epodoc">
<document-id>
<doc-number>AU20100201937</doc-number>
</document-id>
</exch:application-reference>
<exch:publication-reference data-format="docdb" sequence="1">
<document-id>
<country>AU</country>
<doc-number>2010201937</doc-number>
<kind>A1</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="1">
<document-id>
<doc-number>AU2010201937</doc-number>
</document-id>
</exch:publication-reference>
</exch:family-member>
<exch:family-member>
<exch:application-reference data-format="docdb" is-representative="NO">
<document-id>
<country>BR</country>
<doc-number>PI1001711</doc-number>
<kind>A</kind>
</document-id>
</exch:application-reference>
<exch:application-reference data-format="epodoc">
<document-id>
<doc-number>BR2010PI01711</doc-number>
</document-id>
</exch:application-reference>
<exch:publication-reference data-format="docdb" sequence="1">
<document-id>
<country>BR</country>
<doc-number>PI1001711</doc-number>
<kind>A2</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="1">
<document-id>
<doc-number>BRPI1001711</doc-number>
</document-id>
</exch:publication-reference>
</exch:family-member>
<exch:family-member>
<exch:application-reference data-format="docdb" is-representative="NO">
<document-id>
<country>EA</country>
<doc-number>201000681</doc-number>
<kind>A</kind>
</document-id>
</exch:application-reference>
<exch:application-reference data-format="epodoc">
<document-id>
<doc-number>EA20100000681</doc-number>
</document-id>
</exch:application-reference>
<exch:publication-reference data-format="docdb" sequence="1">
<document-id>
<country>EA</country>
<doc-number>201000681</doc-number>
<kind>A1</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="1">
<document-id>
<doc-number>EA201000681</doc-number>
</document-id>
</exch:publication-reference>
</exch:family-member>
<exch:family-member>
<exch:application-reference data-format="docdb" is-representative="NO">
<document-id>
<country>EP</country>
<doc-number>10161600</doc-number>
<kind>A</kind>
</document-id>
</exch:application-reference>
<exch:application-reference data-format="epodoc">
<document-id>
<doc-number>EP20100161600</doc-number>
</document-id>
</exch:application-reference>
<exch:publication-reference data-format="docdb" sequence="1">
<document-id>
<country>EP</country>
<doc-number>2253822</doc-number>
<kind>A2</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="1">
<document-id>
<doc-number>EP2253822</doc-number>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="docdb" sequence="2">
<document-id>
<country>EP</country>
<doc-number>2253822</doc-number>
<kind>A3</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="2">
<document-id>
<doc-number>EP2253822</doc-number>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="docdb" sequence="3">
<document-id>
<country>EP</country>
<doc-number>2253822</doc-number>
<kind>B1</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="3">
<document-id>
<doc-number>EP2253822</doc-number>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="docdb" sequence="4">
<document-id>
<country>EP</country>
<doc-number>2253822</doc-number>
<kind>C0</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="4">
<document-id>
<doc-number>EP2253822</doc-number>
</document-id>
</exch:publication-reference>
</exch:family-member>
<exch:family-member>
<exch:application-reference data-format="docdb" is-representative="YES">
<document-id>
<country>US</country>
<doc-number>47019809</doc-number>
<kind>A</kind>
</document-id>
</exch:application-reference>
<exch:application-reference data-format="epodoc">
<document-id>
<doc-number>US20090470198</doc-number>
</document-id>
</exch:application-reference>
<exch:publication-reference data-format="docdb" sequence="1">
<document-id>
<country>US</country>
<doc-number>8352155</doc-number>
<kind>B2</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="1">
<document-id>
<doc-number>US8352155</doc-number>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="docdb" sequence="2">
<document-id>
<country>US</country>
<doc-number>2010299048</doc-number>
<kind>A1</kind>
</document-id>
</exch:publication-reference>
<exch:publication-reference data-format="epodoc" sequence="2">
<document-id>
<doc-number>US2010299048</doc-number>
</document-id>
</exch:publication-reference>
</exch:family-member>
<exch:abstract lang="en" country="EP" doc-number="2253822" kind="A2" data-format="docdba" abstract-source="EPO">
<exch:p>A method of operating an internal combustion engine (12) in an agricultural harvester (10) includes the steps of operating the engine (12) in a normal mode with a base torque curve (42) as a function of engine operating speed and engine power output, the base torque curve (42) being generally isochronous at a rated operating speed over a power output range terminating at a rated power output; and operating the engine (12) in a boost mode with a boost torque curve (48) when a power boost is required above the rated power output, the boost torque curve (48) having a power output which is above the base torque curve over a predefined range of the operating speed.</exch:p>
</exch:abstract>
</exch:patent-family>
</exch:exchange-document>
</exch:exchange-documents>
EDITED: The above xml is incomplete due to space limitts. You can acces the full xml from here: enter link description here The expected pandas dataframe should have the following as columns:
Country, date-produced, date-of-exchange,dtd-version,file, no-of-document, originating-office, status
CodePudding user response:
you could read an xml file
after read xml by
from pandas import read_xml as pdx
xml_data = open(r'C:\Users\ASUS\Downloads\stackoverflow\file.xml', 'r').read()
you want to specify the path you want to follow like
df = pdx(xml_data,
xpath="//exch:family-member//exch:application-reference//document-id",
namespaces={"exch": "http://www.epo.org/exchange"})
the output is with columns ['country', 'doc-number', 'kind']
CodePudding user response:
I guess, you should remove square brackets from read expression.