I would like to scrape the table that appears when you go to this website:
And take note of the "Payload" tab:
this will later be used as data
in the below example.
Great, but how do I get the data including paginating the page?
To get the data, including page pagination, you can see this example, where we get the HTML table and increase pageNo
for pagination (this is for the "eTenders" table/tab):
import requests
import pandas as pd
from bs4 import BeautifulSoup
data = {
"action": "geteCMSList",
"keyword": "",
"officeId": "0",
"contractAwardTo": "",
"contractStartDtFrom": "",
"contractStartDtTo": "",
"contractEndDtFrom": "",
"contractEndDtTo": "",
"departmentId": "",
"tenderId": "",
"procurementMethod": "",
"procurementNature": "",
"contAwrdSearchOpt": "Contains",
"exCertSearchOpt": "Contains",
"exCertificateNo": "",
"tendererId": "",
"procType": "",
"statusTab": "eTenders",
"pageNo": "1",
"size": "10",
"workStatus": "All",
}
_columns = [
"S. No",
"Ministry, Division, Organization, PE",
"Procurement Nature, Type & Method",
"Tender/Proposal ID, Ref No., Title..",
"Contract Awarded To",
"Company Unique ID",
"Experience Certificate No ",
"Contract Amount",
"Contract Start & End Date",
"Work Status",
]
for page in range(1, 11): # <--- Increase number of pages here
print(f"Page: {page}")
data["pageNo"] = page
response = requests.post(
"https://www.eprocure.gov.bd/AdvSearcheCMSServlet", data=data
)
# The HTML is missing a `table` tag, so we need to fix it
soup = BeautifulSoup("<table>" "".join(response.text) "</table>", "html.parser")
df = pd.read_html(
str(soup),
)[0]
df.columns = _columns
print(df.to_string())
Going further
How do I select the different tabs/tables on the page?
To select the different tabs on the page, you can change the "statusTab" in the data
. Inspect the payload tab again, and you'll see what I mean.
Output
The above code outputs:
S. No Ministry, Division, Organization, PE Procurement Nature, Type & Method Tender/Proposal ID, Ref No., Title.. Contract Awarded To Company Unique ID Experience Certificate No\t Contract Amount Contract Start & End Date Work Status
0 1 Ministry of Education, Education Engineering Department, Office of the Executive Engineer, EED,Kishoreganj Zone. Works, NCT, LTM 300580, 932/EE/EED/KZ/Rev-5974/2018-19/23, Dt: 28/03/2019 Repair and Renovation Works at Chowganga Shahid Smrity High School Itna Kishoreganj. 01-Apr-2019 M/S KAZI RASEL NIRMAN SONGSTA 1051854 WD-5974- 25/e-GP/20221228/300580/0060000 475000.000 10-Jun-2019 03-Sep-2019 Completed
1 2 Ministry Of Water Resourses, Bangladesh Water Development Board (BWDB), Chattogram Mechanical Division Works, NCT, LTM 558656, CMD/T-19/100 Dated: 14-03-2021 Manufacturing supplying & installation of 01 No MS Flap gate size - 1.65 m 1.95m and 01 no. Padestal type lifting device for sluice no S-15 6-vent 02 nos MS Vertical gate size - 1.65 m 1.95m for sluice no S-15 6-vent and sluice no S-14 new 1-vent at Coxs Bazar Sadar Upazilla of CEP Polder No 66/1 under Coxsbazar O&M Division implemented by Chattogram Mechanical Division BWDB Madunaghat Chattogram during the financial year 2020-21. 15-Mar-2021 M/S. AN Corporation 1063426 CMD/COX/LTM-16/2020-21/e-GP/20221228/558656/0059991 503470.662 12-Apr-2021 05-May-2021 Completed
2 3 Ministry Of Water Resourses, Bangladesh Water Development Board (BWDB), Chattogram Mechanical Division Works, NCT, LTM 633496, CMD/T-19/263 Dated: 30-11-2021 Manufacturing, supplying & installation of 07 No M.S Flap gate for sluice no.- 6 (1-vent), sluice no.- 7 (2-vent), sluice no.-8 (2-vent), sluice no.-35 (2-vent) size :- (1.00 m Ã?1.00m), 01 No Padestal type lifting device for sluice no- 13(1-vent) for CEP Polder No 64/2B, at pekua Upazilla under Chattogram Mechanical Division, BWDB, Madunaghat, Chattogram, during the financial year 2021-22. 30-Nov-2021 M/S. AN Corporation 1063426 CMD/LTM-08/2021-22/e-GP/20221228/633496/0059989 648808.272 26-Dec-2021 31-Jan-2022 Completed
...
...