I am writing a small program to fetch stock exchange data using python. The sample code below makes a request to a url and it should return the appropriate data. Here is the resource that I am using: https://python.plainenglish.io/4-python-libraries-to-help-you-make-money-from-webscraping-57ba6d8ce56d
from xml.dom.minidom import Element
from selenium import webdriver
from bs4 import BeautifulSoup
import logging
from selenium.webdriver.common.by import By
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
url = "http://eoddata.com/stocklist/NASDAQ/A.htm"
driver = webdriver.Chrome(executable_path="C:\Program Files\Chrome\chromedriver")
page = driver.get(url)
# TODO: find element by CSS selector
stock_symbol = driver.find_elements(by=By.CSS_SELECTOR, value='#ctl00_cph1_divSymbols')
soup = BeautifulSoup(driver.page_source, features="html.parser")
elements = []
table = soup.find('div', {'id','ct100_cph1_divSymbols'})
logging.info(f"{table}")
I've added a todo for getting the element that I am trying to retrieve from the program.
Expected: The proper data should be returned.
Actual: Nothing is returned.
Any help would be greatly appreciated.
CodePudding user response:
The table data isn't dynamic. So you can mimic using bs4 with pandas or using only pandas
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://eoddata.com/stocklist/NASDAQ/A.htm"
res = requests.get(url)
soup = BeautifulSoup(res.text,'html.parser').select_one('.quotes')
df= pd.read_html(str(soup))[0]
print(df)
Output:
ode Name High ... Change.1 Change.2 Unnamed: 9
0 AACG Ata Creativity Global ADR 1.4300 ... NaN 2.99 NaN
1 AACI Armada Acquisition Corp I 9.8810 ... NaN 0.11 NaN
2 AACIU Armada Acquisition Corp I 9.9600 ... NaN 0.10 NaN
3 AACIW Armada Acquisition Corp I WT 0.1893 ... NaN 0.32 NaN
4 AADI Aadi Biosciences Inc 13.4000 ... NaN 2.70 NaN
.. ... ... ... ... ... ... ...
565 AZ A2Z Smart Technologies Corp 3.0200 ... NaN 15.20 NaN
566 AZN Astrazeneca Plc ADR 67.5000 ... NaN 0.03 NaN
567 AZPN Aspen Technology 189.7000 ... NaN 4.67 NaN
568 AZTA Azenta Inc 76.7800 ... NaN 1.10 NaN
569 AZYO Aziyo Biologics Inc Cl A 6.1000 ... NaN 3.00 NaN
[570 rows x 10 columns]
To grab table data using pandas only:
import pandas as pd
url = "https://eoddata.com/stocklist/NASDAQ/A.htm"
df= pd.read_html(url,attrs={"class":"quotes"})[0]
print(df)
CodePudding user response:
It is most common practice to scrape tables with pandas.read_html()
to get its texts, so I would also recommend it.
But to answer your question and follow your approach, select <div>
and <table>
more specific:
soup.select('#ctl00_cph1_divSymbols table')`
To get and store the data you could iterat the rows and append results to a list:
data = []
for row in soup.select('#ctl00_cph1_divSymbols table tr:has(td)'):
d = dict(zip(soup.select_one('#ctl00_cph1_divSymbols table tr:has(th)').stripped_strings,row.stripped_strings))
d.update({'url': 'https://eoddata.com' row.a.get('href')})
data.append(d)
Example
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://eoddata.com/stocklist/NASDAQ/A.htm"
res = requests.get(url)
soup = BeautifulSoup(res.text)
data = []
for row in soup.select('#ctl00_cph1_divSymbols table tr:has(td)'):
d = dict(zip(soup.select_one('#ctl00_cph1_divSymbols table tr:has(th)').stripped_strings,row.stripped_strings))
d.update({'url': 'https://eoddata.com' row.a.get('href')})
data.append(d)
pd.DataFrame(data)
Output
Code | Name | High | Low | Close | Volume | Change | url | |
---|---|---|---|---|---|---|---|---|
0 | AACG | Ata Creativity Global ADR | 1.390 | 1.360 | 1.380 | 8,900 | 0 | https://eoddata.com/stockquote/NASDAQ/AACG.htm |
1 | AACI | Armada Acquisition Corp I | 9.895 | 9.880 | 9.880 | 5,400 | -0.001 | https://eoddata.com/stockquote/NASDAQ/AACI.htm |
2 | AACIU | Armada Acquisition Corp I | 9.960 | 9.960 | 9.960 | 300 | -0.01 | https://eoddata.com/stockquote/NASDAQ/AACIU.htm |
3 | AACIW | Armada Acquisition Corp I WT | 0.1900 | 0.1699 | 0.1700 | 36,400 | -0.0193 | https://eoddata.com/stockquote/NASDAQ/AACIW.htm |
4 | AADI | Aadi Biosciences Inc | 13.40 | 12.66 | 12.90 | 98,500 | -0.05 | https://eoddata.com/stockquote/NASDAQ/AADI.htm |
5 | AADR | Advisorshares Dorsey Wright ETF | 47.49 | 46.82 | 47.49 | 1,100 | 0.3 | https://eoddata.com/stockquote/NASDAQ/AADR.htm |
6 | AAL | American Airlines Gp | 14.44 | 13.70 | 14.31 | 45,193,100 | -0.46 | https://eoddata.com/stockquote/NASDAQ/AAL.htm |
...