We are pulling contact info from https://nbpa.com/agents/directory
. There is no table on the page, but rather <div>s
with <p>
elements inside:
We can grab this element with:
agents_url <- "https://nbpa.com/agents/directory"
agents_page <- agents_url %>% read_html()
agents_page_elements <- agents_page %>% html_nodes('div.accordion-inner')
agents_page_elements[1]
agents_page_elements[1] %>% html_nodes('p')
We are looking to convert this into a 1-row dataframe:
Cell Email Professional Credentials:
(123) 456-7890 firstlast@email.com "NBA Certified Player Agent..."
Is this possible to do? The challenging part about this web scrape is that each accordion-inner div on the website has different p elements. Some have Cell: and Email:, others have Education:, Address:, etc. It varies by accordion-inner. If we can turn each individual node into a 1-row dataframe, we can then rbind all dataframes together using plyr::rbind.fill()
.
CodePudding user response:
We can use read.dcf
after getting as text
new <- agents_page_elements[1] %>%
html_nodes('p') %>%
html_text()
as.data.frame(read.dcf(textConnection(new)))
-output
Cell Email Professional Credentials
1 (240) 668-4241 barry.aberdeen@tributesports.com NBA Certified Player Agent, FIBA Certified Player Agent, WNBA Certified Player Agent
For multiple elements, use map
library(purrr)
library(dplyr)
library(stringr)
out <- map_dfr(agents_page_elements, ~ {
new <- .x %>%
html_nodes('p') %>%
html_text() %>%
str_replace_all("\n\\s*", " ")
if(length(new) > 0) {
as.data.frame(read.dcf(textConnection(new)))
} else NULL
})
-output
> dim(out)
[1] 455 9
> head(out, 2)
Cell Email Professional Credentials Title
1 (240) 668-4241 barry.aberdeen@tributesports.com NBA Certified Player Agent, FIBA Certified Player Agent, WNBA Certified Player Agent <NA>
2 (281) 773-7339 <NA> Texas Bar No. 24050197|Wisconsin Bar No. 1045470 Attorney
Company Name Education Address Office International
1 <NA> <NA> <NA> <NA> <NA>
2 Adams & Associates, LLC University of Southern California B.S. |University of Houston - M.B.A |University of Wisconsin - J.D. <NA> <NA> <NA>