Home > Net >  Scrape table from JSP website using Python
Scrape table from JSP website using Python

Time:12-29

I would like to scrape the table that appears when you go to this website: enter image description here

And take note of the "Payload" tab:

enter image description here

this will later be used as data in the below example.

Great, but how do I get the data including paginating the page?

To get the data, including page pagination, you can see this example, where we get the HTML table and increase pageNo for pagination (this is for the "eTenders" table/tab):

import requests
import pandas as pd
from bs4 import BeautifulSoup


data = {
    "action": "geteCMSList",
    "keyword": "",
    "officeId": "0",
    "contractAwardTo": "",
    "contractStartDtFrom": "",
    "contractStartDtTo": "",
    "contractEndDtFrom": "",
    "contractEndDtTo": "",
    "departmentId": "",
    "tenderId": "",
    "procurementMethod": "",
    "procurementNature": "",
    "contAwrdSearchOpt": "Contains",
    "exCertSearchOpt": "Contains",
    "exCertificateNo": "",
    "tendererId": "",
    "procType": "",
    "statusTab": "eTenders",
    "pageNo": "1",
    "size": "10",
    "workStatus": "All",
}


_columns = [
    "S. No",
    "Ministry, Division, Organization, PE",
    "Procurement Nature, Type & Method",
    "Tender/Proposal ID, Ref No., Title..",
    "Contract Awarded To",
    "Company Unique ID",
    "Experience Certificate No  ",
    "Contract Amount",
    "Contract Start & End Date",
    "Work Status",
]

for page in range(1, 11):  # <--- Increase number of pages here
    print(f"Page: {page}")
    data["pageNo"] = page


    response = requests.post(
        "https://www.eprocure.gov.bd/AdvSearcheCMSServlet", data=data
    )
    # The HTML is missing a `table` tag, so we need to fix it
    soup = BeautifulSoup("<table>"   "".join(response.text)   "</table>", "html.parser")
    df = pd.read_html(
        str(soup),
    )[0]

    df.columns = _columns
    print(df.to_string())

Going further

How do I select the different tabs/tables on the page?

To select the different tabs on the page, you can change the "statusTab" in the data. Inspect the payload tab again, and you'll see what I mean.

Output

The above code outputs:

   S. No                                                                              Ministry, Division, Organization, PE Procurement Nature, Type & Method                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Tender/Proposal ID, Ref No., Title..             Contract Awarded To  Company Unique ID                                                    Experience Certificate No\t  Contract Amount Contract Start & End Date Work Status
0      1  Ministry of Education, Education Engineering Department, Office of the Executive Engineer, EED,Kishoreganj Zone.                   Works, NCT, LTM                                                                                                                                                                                                                                                                                                                                                                  300580, 932/EE/EED/KZ/Rev-5974/2018-19/23, Dt: 28/03/2019 Repair and Renovation Works at Chowganga Shahid Smrity High School Itna Kishoreganj. 01-Apr-2019   M/S KAZI RASEL NIRMAN SONGSTA            1051854                                       WD-5974- 25/e-GP/20221228/300580/0060000       475000.000   10-Jun-2019 03-Sep-2019   Completed
1      2            Ministry Of Water Resourses, Bangladesh Water Development Board (BWDB), Chattogram Mechanical Division                   Works, NCT, LTM                       558656, CMD/T-19/100 Dated: 14-03-2021 Manufacturing supplying & installation of 01 No MS Flap gate size - 1.65 m 1.95m and 01 no. Padestal type lifting device for sluice no S-15 6-vent 02 nos MS Vertical gate size - 1.65 m 1.95m for sluice no S-15 6-vent and sluice no S-14 new 1-vent at Coxs Bazar Sadar Upazilla of CEP Polder No 66/1 under Coxsbazar O&M Division implemented by Chattogram Mechanical Division BWDB Madunaghat Chattogram during the financial year 2020-21. 15-Mar-2021             M/S. AN Corporation            1063426                            CMD/COX/LTM-16/2020-21/e-GP/20221228/558656/0059991       503470.662   12-Apr-2021 05-May-2021   Completed
2      3            Ministry Of Water Resourses, Bangladesh Water Development Board (BWDB), Chattogram Mechanical Division                   Works, NCT, LTM                                                                633496, CMD/T-19/263 Dated: 30-11-2021 Manufacturing, supplying & installation of 07 No M.S Flap gate for sluice no.- 6 (1-vent), sluice no.- 7 (2-vent), sluice no.-8 (2-vent), sluice no.-35 (2-vent) size :- (1.00 m Ã?1.00m), 01 No Padestal type lifting device for sluice no- 13(1-vent) for CEP Polder No 64/2B, at pekua Upazilla under Chattogram Mechanical Division, BWDB, Madunaghat, Chattogram, during the financial year 2021-22. 30-Nov-2021             M/S. AN Corporation            1063426                                CMD/LTM-08/2021-22/e-GP/20221228/633496/0059989       648808.272   26-Dec-2021 31-Jan-2022   Completed
...
...
  • Related