Home > Mobile >  How do I pull specific fields from this XML file using R or Python?
How do I pull specific fields from this XML file using R or Python?

Time:07-10

I am attempting to convert enter image description here

Ideally a result would look something like this:

LAST NAME FIRST NAME MID NAME
Doe John Bob

CodePudding user response:

If you are using R, it is straightforward to get these fields using the xml2 or rvest packages. For example, using the first xml file in the linked zip folder:

library(rvest)

entries <- read_html(path_to_xml) %>% 
  html_nodes(xpath = "//info")

result <- data.frame(Last_Name = entries %>% html_attr("lastnm"),
                     First_Name = entries %>% html_attr("firstnm"),
                     Mid_Name = entries %>% html_attr("midnm"))

head(result)
#>   Last_Name First_Name Mid_Name
#> 1    FISHER     ANDREW   MUNSON
#> 2 BACHARACH       ALAN   MARTIN
#> 3     GRAFF    MICHAEL  RAYMOND
#> 4      KAST    WILLIAM    ALLEN
#> 5   McMahan     Robert  Michael
#> 6   JOHNSON       JOHN        C

Created on 2022-07-09 by the reprex package (v2.0.1)

CodePudding user response:

Or with XML package:

library(XML)

doc <- xmlTreeParse(file=path_to_file,useInternalNodes = TRUE)
XML::xpathApply(doc,"//Info",function(x) xmlToList(x))

doc <- xmlTreeParse(file='c:/RDev/test.xml',useInternalNodes = TRUE)
info <- XML::xpathApply(doc,"//Info",function(x) xmlToList(x))

info[[1]]

[[1]]
           lastNm                                                  firstNm 
         "FISHER"                                                 "ANDREW" 
            midNm                                                  indvlPK 
         "MUNSON"                                                "2917271" 
        actvAGReg                                                     link 
              "N" "https://adviserinfo.sec.gov/individual/summary/2917271" 
  • Related