Trying to scrap IPO table data from here: https://www.iposcoop.com/last-12-months/
Here is my code:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.iposcoop.com/last-12-months/'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
table1 = soup.find("table",id='DataTables_Table_0')
table1_data = table1.tbody.find_all("tr")
table1
However, table1 is NonType. Why is that? Any solution? I have read related issues, iframe doesn't seem to be the answer.
CodePudding user response:
You can grab table data using pandas
import pandas as pd
import requests
from bs4 import BeautifulSoup
url='https://www.iposcoop.com/last-12-months'
req=requests.get(url).text
soup=BeautifulSoup(req,'lxml')
table=soup.select_one('.standard-table.ipolist')
table_data =pd.read_html(str(table))[0]
print(table_data)
Output:
Company Symbol ... Return SCOOP Rating
0 Akanda Corp. AKAN ... 85.00% S/O
1 The Marygold Companies, Inc. (aka Concierge Te... MGLD ... 9.50% S/O
2 Blue Water Vaccines, Inc. BWV ... 343.33% S/O
3 Meihua International Medical Technologies MHUA ... -33.00% S/O
4 Vivakor, Inc. VIVK ... -49.40% S/O
.. ... ... ... ... ...
355 Khosla Ventures Acquisition Co. III KVSC ... -2.80% S/O
356 Dragoneer Growth Opportunities Corp. III DGNU ... -2.40% S/O
357 Movano Inc. MOVE ... -43.60% S/O
358 Supernova Partners Acquisition Company III STRE.U ... 0.10% S/O
359 Universe Pharmaceuticals UPC ... -74.00% S/O
[360 rows x 10 columns]
CodePudding user response:
While F.Hoque's answer gives you a solution, it does not seem to answer why your code throws an error.
You are trying to find a table with the id DataTables_Table_0
. Opening the page in a browser, you can see that such an element with the given id exists. But if you open the same page after disabling Javascript you can see that the id no longer exists on the table. This id is being assigned by some javascript module.
BeautifulSoup can only fetch the base HTML and it does not run javascript modules. So you have 2 solutions:
- Use a selector that exists in the base HTML (in this case
.standard-table.ipolist
) - Use selenium to run Javascript and fetch the HTML as it is seen in a browser