Home > Software design >  R - parse XML and process it
R - parse XML and process it

Time:09-10

I have an XML with the following format:

<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:entity_soft_delete="http://drupal.org/project/entity_soft_delete/rdf#" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:og="http://ogp.me/ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:sioc="http://rdfs.org/sioc/ns#" xmlns:sioct="http://rdfs.org/sioc/types#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" version="2.0" xml:base="http://skillspanorama.cedefop.europa.eu/en/eusprssfeed">
<channel>
<title>Skills Panorama RSS Feeds</title>
<link>http://skillspanorama.cedefop.europa.eu/en/eusprssfeed</link>
<description/>
<language>en</language>
<atom:link href="http://skillspanorama.cedefop.europa.eu/en/skills-panorama-rss-feeds.xml" rel="self" type="application/rss xml"/>
<item>
<title>Title1</title>
<link>http://skillspanorama.cedefop.europa.eu/en/news/cedefop-future-vet-now</link>
<description><div ><div ><div >Cedefop: The future of VET is now</div></div></div><div ><div ><div ><span  property="dc:date" datatype="xsd:dateTime" content="2017-10-31T00:00:00 01:00">Tuesday, October 31, 2017</span></div></div></div><div ><div ><div ><p>The 26th annual international conference of the European Forum for Vocational Education and Training (EfVET), which was held in Thessaloniki from 25 to 28 October, focused on ‘aligning work and education to the future’.</p> </div></div></div><div ><div ><div  property="content:encoded"><p> </p> <p> </p> <p> </p> <p><img alt="" src="http://www.cedefop.europa.eu/files/images/efvet_final.jpg" typeof="Image" /></p> <p>The 26th annual international conference of the European Forum for Vocational Education and Training (EfVET), which was held in Thessaloniki from 25 to 28 October, focused on ‘aligning work and education to the future’.</p> <p>Over 220 training providers from across Europe and guests from Hong Kong, the USA and Canada took part in the conference.</p> <p>In his keynote speech, Cedefop Director James Calleja said that businesses and the different stakeholders assist one another in defining short- and long-term skill needs locally, regionally and at European level. He stressed that the relationship between businesses and vocational education and training (VET) providers is a necessity for both, as lifelong learning is part and parcel of the future of work and of VET.</p> <p>Mr Calleja added that evidence from Cedefop research shows clearly an increasingly beneficial connection between work-based learning and employability for learners, workers and employers.</p> <p>According to the Cedefop Director, we face at least four challenges either from the perspective of the training provider or from the side of the employer:</p> <ul><li>different stakeholders understand the concept of skill needs differently – businesses look for immediacy while social partners stress the need to develop talent gradually and permanently;</li> <li>if stakeholders do not have a common understanding of skill needs, there is a marked coordination failure and stakeholders need to address it by seeking each other’s opinion and consensus;</li> <li>involving regional and local authorities in skill needs assessment and anticipation heightens skills governance challenges in relation to bottom-up coordination and financial constraints;</li> <li>one needs to improve feedback loops between businesses and stakeholders through efficient communication channels so that all speak the same language.</li> </ul><p>Speaking again at the end of the conference, Mr Calleja stated that the challenges of VET and employment are the same; advancements in technology, automation, demographic changes and new forms of learning give VET a new profile of excellence and inclusion, and work environments the necessity to create learning spaces for their employees throughout their careers.</p> <p>The divide between learning and working is gradually closing down, he added, as no one can afford to stop learning. VET providers cannot deliver ready-made human capital and employers cannot afford to have low quality if they seek to be competitive. This dual role creates a new bond between the world of education and training and the world of employment, with employees benefitting from workplace and lifelong learning.</p> </div></div></div><div ><div ><div ><a href="http://www.cedefop.europa.eu/en/news-and-press/news/future-vet-now" target="_blank">The future of VET is now</a></div></div></div></description>
<pubDate>Tue, 31 Oct 2017 11:12:01  0000</pubDate>
<dc:creator>Anonymous</dc:creator>
<guid isPermaLink="false">20761 at http://skillspanorama.cedefop.europa.eu</guid>
</item>
<item>
<title>European Commission on Employment: Report confirms effectiveness of EU Globalisation Fund</title>
<link>http://skillspanorama.cedefop.europa.eu/en/news/european-commission-employment-report-confirms-effectiveness-eu-globalisation-fund</link>
<description><div ><div ><div >European Commission on Employment: Report confirms effectiveness of EU Globalisation Fund</div></div></div><div ><div ><div ><span  property="dc:date" datatype="xsd:dateTime" content="2017-10-31T00:00:00 01:00">Tuesday, October 31, 2017</span></div></div></div><div ><div ><div ><p>The Commission has published its report on the performance of the European Globalisation Adjustment Fund (EGF) in 2015 and 2016.</p> <p><img src="http://ec.europa.eu/social/BlobServlet?mode=displayPicture&amp;photoId=9897" /></p> </div></div></div><div ><div ><div  property="content:encoded"><p>The report reaffirms the role of the Fund as a flagship demonstration of European solidarity within the limits of its set-up and budgetary availabilities, having helped close to 19,500 workers to adjust to changing trade patterns and consequences of the economic and financial crisis in that period..</p> <p>European Commissioner for Employment, Social Affairs, Skills and Labour Mobility Marianne Thyssen,said: "Today's results demonstrate the added value of the Globalisation Adjustment Fund in helping redundant workers who have difficulties to find a new job. The assistance worth €70 million of the Fund has paid off: in 2015 and 2016, 9,072assisted workers were re-employed, despite the challenging labour market situation these people faced. This year's tenth anniversary of the Globalisation Adjustment Fund marks it as a proof of European solidarity to workers falling victim to mass lay-offs caused by globalisation or the crisis."</p> <p>9,072 workers, or close to half of the workers who participated in the Globalisation Adjustment Fund measures, had found new jobs or were self-employed after one year, at the end of the implementation period of the measures. An additional 645 people were at that time in education or training to increase their future employability.</p> <p>EU countries also reported that the personal situation, employability and self-confidence of the workers concerned had visibly improved thanks to the Globalisation Adjustment Fund assistance and services. This was even the case for those who had not found new work immediately after the end of the measures.</p> <p><!-- VIDEO FILTER - INVALID CODEC IN: [video:http://europa.eu/!Df66FB] --></p> <p>These positive results are encouraging, especially given the difficult context in which they have been achieved. The labour market situation in some EU countries was particularly challenging in the period covered by the report. Mass lay-offs occurred in territories that were already suffering from above average unemployment rates. Many supported workers were low-skilled or had other disadvantages as jobseekers.</p> <p>This proves once again that EU funding, such as the Globalisation Adjustment Fund, can make a difference, especially for the most vulnerable people in our societies.</p> </div></div></div><div ><div ><div ><a href="http://ec.europa.eu/social/main.jsp?langId=en&amp;catId=89&amp;newsId=2889&amp;furtherNews=yes" target="_blank">Employment: Report confirms effectiveness of EU Globalisation Fund</a></div></div></div></description>
<pubDate>Tue, 31 Oct 2017 11:12:01  0000</pubDate>
<dc:creator>Anonymous</dc:creator>
<guid isPermaLink="false">20763 at http://skillspanorama.cedefop.europa.eu</guid>
</item>

I tried parsing it with the following code:

# Load the package required to read XML files.
library("XML")

# Also load the other required package.
library("methods")

# Give the input file name to the function.
result <- xmlParse(file = "xml_file.xml")

# Print the result.
print(result)


xml_data <- xmlToList(result)

but the result I get is not what I want. How can I parse the XML file correctly? I want to keep all the texts between the description tags so that I can process them later. I could store them in a dataframe or in another data structure as appropriate.

CodePudding user response:

If you are looking for all of the text from the description nodes lumped a vector of text elements, it is pretty straight forward using the xml2 library:

library(xml2)
library(dplyr)

page <-read_xml('URL, file or text information here')
#strip namespace
#xml_ns_rename(page)

#find all of the description nodes and convert to text
description <- xml_find_all(page, ".//description") %>% xml_text()

With the sample data above (incomplete) this will produce a vector of 3 (first one is empty) with the contents. If you are looking to break this down further, then accept this answer and ask a new question clarifying exactly what you are looking for.

  •  Tags:  
  • r xml
  • Related