Home > database >  Not scrapping whole html and table
Not scrapping whole html and table

Time:07-21

I am trying to extract the table from this website When I am scrapping, it is not giving the full html script. also the table tag has no class or id in it. Can anyone suggest how to extract? https://www.jaivikkheti.in/inputsupplier

enter image description here [url]: https://www.jaivikkheti.in/inputsupplier

CodePudding user response:

You can use Selenium library in Python to access them.

Here is an example how you can do it, before you run the script below make sure that you have installed selenium and numpy using pip.

from selenium import webdriver
from selenium.webdriver.common.by import By
import numpy as np
import os

os.environ['MOZ_HEADLESS'] = '1'
driver = webdriver.Firefox()
driver.get("https://www.jaivikkheti.in/inputsupplier")

# this line returns the table header tags in the HTML format
header_html = driver.find_elements(
    By.TAG_NAME, "th")

# this list comprehension parses the table header tags to get the text content
header_text = [item.text for item in header_html]

# this line returns the table body tags in the HTML format
body_html = driver.find_elements(
    By.TAG_NAME, "td")

"""
This list comprehension parses the table body tags to get the text content
It also separates them in two dimensional list each list containing a row
I'm using numpy to reshape them into 2D list
    reshape(int(len(body_html)/len(header_text)), len(header_text))
        int(len(body_html)/len(header_text) --> this is the number of rows in the table
        len(header_text) --> number of columns in the table
"""

body_text = np.array([item.text for item in body_html]).reshape(
    int(len(body_html)/len(header_text)), len(header_text))

print(header_text)
print(body_text)

CodePudding user response:

It's not giving you the full html because the table is rendered using javascript so you have to wait for the page to fully load.

You can use selenium if you know how to use it

Or

You can send a request through an Api that supports Javascript rendering like:

https://www.scraperapi.com/

  • Related