I'm trying to get web data from a website, and I only need to grab inner html data from a tbody class and convert it into json for better control as well as to save the data into a file later on. I've only managed to read each element by using find_element(By.XPATH) from selenium. Is there any way to read the whole innter html tbody content then parse it to json? requests wont work since it's inside an iframe.
The website and the tbody is the scroll table with title :"Tình hình dịch cả nước", I only want the table minus the title, and the header of the table if possible.
The code for reading an element:
browser=webdriver.Firefox()
browser.get("https://covid19.gov.vn/")
time.sleep(3)
browser.switch_to.frame(browser.find_element(By.XPATH,'/html/body/div[1]/div[2]/div[3]/div/iframe'))
value=browser.find_element(By.XPATH,'/html/body/div[2]/div[1]/div/div[2]/div[1]/span[4]')
print(value.text)
CodePudding user response:
I've found an answer to this with
find_element(By.XPATH,'xpath').get_attribute('innerHTML')
CodePudding user response:
Just call the same endpoint the page does which returns JSON.
import requests
import pandas as pd
r = requests.get('https://static.pipezero.com/covid/data.json').json()
location_json = r['locations']
df = pd.DataFrame(location_json)
print(df)