Ideally a result would look something like this:
LAST NAME | FIRST NAME | MID NAME |
---|---|---|
Doe | John | Bob |
CodePudding user response:
If you are using R, it is straightforward to get these fields using the xml2 or rvest packages. For example, using the first xml file in the linked zip folder:
library(rvest)
entries <- read_html(path_to_xml) %>%
html_nodes(xpath = "//info")
result <- data.frame(Last_Name = entries %>% html_attr("lastnm"),
First_Name = entries %>% html_attr("firstnm"),
Mid_Name = entries %>% html_attr("midnm"))
head(result)
#> Last_Name First_Name Mid_Name
#> 1 FISHER ANDREW MUNSON
#> 2 BACHARACH ALAN MARTIN
#> 3 GRAFF MICHAEL RAYMOND
#> 4 KAST WILLIAM ALLEN
#> 5 McMahan Robert Michael
#> 6 JOHNSON JOHN C
Created on 2022-07-09 by the reprex package (v2.0.1)
CodePudding user response:
Or with XML
package:
library(XML)
doc <- xmlTreeParse(file=path_to_file,useInternalNodes = TRUE)
XML::xpathApply(doc,"//Info",function(x) xmlToList(x))
doc <- xmlTreeParse(file='c:/RDev/test.xml',useInternalNodes = TRUE)
info <- XML::xpathApply(doc,"//Info",function(x) xmlToList(x))
info[[1]]
[[1]]
lastNm firstNm
"FISHER" "ANDREW"
midNm indvlPK
"MUNSON" "2917271"
actvAGReg link
"N" "https://adviserinfo.sec.gov/individual/summary/2917271"