Home > front end >  how do you scrape tables with the 'ngcontent' format using selenium/python?
how do you scrape tables with the 'ngcontent' format using selenium/python?

Time:08-05

Basic tables are fairly easy to scrape with Selenium. I am having trouble scraping tables with "_ngcontent" notations ("https://material.angular.io/components/table/overview"). I am trying to scrape it into a dataframe.

enter image description here

This is how far I got:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

PATH = "C:\chromedriver.exe"

driver = webdriver.Chrome(PATH)

URL = 'https://material.angular.io/components/table/overview'

driver.get(URL)

titles = driver.find_element(By.CSS_SELECTOR, '#table-basic > div > div.docs-example-viewer-body.ng-star-inserted > table-basic-example > table > thead')
print(titles.text)

I was only able to get an element with: 'No. Name Weight Symbol' But I am not able to iterate through it, and scrape the data.

Please assist

CodePudding user response:

To grab the table data easily, you can use selenium with pandas

import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
table=driver.get('https://material.angular.io/components/table/overview')
driver.maximize_window()
table = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '(//table)[1]'))).get_attribute("outerHTML")
df = pd.read_html(table)[0]
print(df)

Output:

     No.    Name     Weight  Symbol
0    1   Hydrogen   1.0079      H
1    2     Helium   4.0026     He
2    3    Lithium   6.9410     Li
3    4  Beryllium   9.0122     Be
4    5      Boron  10.8110      B
5    6     Carbon  12.0107      C
6    7   Nitrogen  14.0067      N
7    8     Oxygen  15.9994      O
8    9   Fluorine  18.9984      F
9   10       Neon  20.1797     Ne
  • Related