I want to scrape a table with beautifulsoup-CodePudding

Hi I am new to stackoverflow.

I am trying to scrape a the table which comes under the heading "Import VAT and excise" from this website for the commodity code"1704906500". I know for sure that the table will fall under "Import VAT and excise" . I have several commodity codes and I will be looping through all the codes. The problem here is I am not able to point to the table under "Import VAT and excise " for scraping.

Please advice?

Weblink Scraping Webpage

Screenshot of the table

import pandas as pd
import re
import requests
from bs4 import BeautifulSoup, NavigableString, Tag
comCode="1704906500"
url = "https://www.trade-tariff.service.gov.uk/commodities/" comCode  "?currency=GBP#import"
url_request = requests.get(url).text
soup=BeautifulSoup(url_request, "lxml")

for header in soup.find_all('h3', text=re.compile('Import VAT and excise')):
    nextNode = header
    while True:
        nextNode = nextNode.nextSibling
        if nextNode is None:
            break
        if isinstance(nextNode, Tag):
            if nextNode.name == "h3":
                break
            print((nextNode))
            #comm_table = pd.read_html(nextNode.text, attrs = {"table class":"small-table measures govuk-table"} )

CodePudding user response：

You could use .find_next('table') based on the selection of your heading:

soup.find('h3', text=re.compile('Import VAT and excise')).find_next('table')

or as alternative with css selectors:

soup.select_one('h3:-soup-contains("Import VAT and excise")').find_next('table')

Example

Iterate over a list of comCodes and concat all the tables to one dataframe:

import pandas as pd
import requests
from bs4 import BeautifulSoup
comCode=["1704906500"]

data = []

for c in comCode:
    url = f'https://www.trade-tariff.service.gov.uk/commodities/{c}?currency=GBP#import'
    soup=BeautifulSoup(requests.get(url).text)
    data.append(pd.read_html(str(soup.select_one('h3:-soup-contains("Import VAT and excise")').find_next('table')))[0])

pd.concat(data)