I'm trying to import data from an XML file to PostgreSQL. I would like to create columns from the attributes in the <row>
tag. The problem is when I use xpath
to get text, I get nothing.
Here is a sample input testinput.xml
file
<tags>
<row Id="1" TagName=".net" Count="316293" ExcerptPostId="3624959" WikiPostId="3607476" />
<row Id="2" TagName="html" Count="1116853" ExcerptPostId="3673183" WikiPostId="3673182" />
<row Id="3" TagName="javascript" Count="2343663" ExcerptPostId="3624960" WikiPostId="3607052" />
</tags>
Here is my myquery.sql
DO $$
DECLARE xml_string xml;
BEGIN
xml_string := XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('/home/me/data/testinput.xml'), 'UTF8'));
DROP TABLE IF EXISTS tags;
CREATE TABLE tags AS
SELECT
(xpath('//ID', x))[1]::text as id,
(xpath('//TagName', x))[1]::text as tagName,
(xpath('//Count', x))[1]::text as count,
(xpath('//ExcerptPostId', x))[1]::text as excerptPostId,
(xpath('//WikiPostId', x))[1]::text as wikiPostId
FROM unnest(xpath('//row', xml_string)) x;
END$$;
SELECT * FROM tags;
I got this
id | tagname | count | excerptpostid | wikipostid
---- --------- ------- --------------- ------------
| | | |
| | | |
| | | |
(3 rows)
By the way, my .xml
file is around 100GB, so any fast solution would help. Thank you!
CodePudding user response:
Because the values you intend to parse are attributes (not elements), you need to parse with @
keyword:
SELECT
(xpath('//@Id', x))[1]::text as id,
(xpath('//@TagName', x))[1]::text as tagName,
(xpath('//@Count', x))[1]::text as count,
(xpath('//@ExcerptPostId', x))[1]::text as excerptPostId,
(xpath('//@WikiPostId', x))[1]::text as wikiPostId
FROM unnest(xpath('//row', xml_string)) x;