I've been practicing web scraping and this time I'm trying to get only the first column of data (only the stock symbols) all the way down but it keeps pulling all the data from the table? Not sure what I'm doing wrong any assistance would be appreciated thank you
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_S&P_500_companies"
r = requests.get(url, headers=headers)
tables= pd.read_html(url, attrs={'id': 'constituents'})
df = df.iloc[1:]
print (df)
#df.to_csv('Stock_List.txt', index=False, encoding='utf-8')
CodePudding user response:
First you have to get single table from all tables.
And next you can get column Symbols
df = tables[0]
df = df['Symbol']
Full working code
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_S&P_500_companies"
tables = pd.read_html(url, attrs={'id': 'constituents'})
df = tables[0]
print(df['Symbol'])
If you want also links assigned to symbols then you will have to use requests
and BeautifulSoup
because read_html
can't give it.
import bs4 as bs
import requests
url = 'https://en.wikipedia.org/wiki/List_of_S&P_500_companies'
r = requests.get(url)
soup = bs.BeautifulSoup(r.text, "html.parser")
table = soup.find('table', {'class': 'wikitable sortable'})
symbols = []
for row in table.find_all('tr')[1:]: # [1:] to skip header
items = row.find_all('td')
symbol = items[0].text.strip()
link = items[0].find('a')['href']
symbols.append([symbol, link])
print(f"{symbol:5} | {link}")
#print(symbols)
Result:
MMM | https://www.nyse.com/quote/XNYS:MMM
ABT | https://www.nyse.com/quote/XNYS:ABT
ABBV | https://www.nyse.com/quote/XNYS:ABBV
ABMD | http://www.nasdaq.com/symbol/abmd
ACN | https://www.nyse.com/quote/XNYS:ACN
ATVI | http://www.nasdaq.com/symbol/atvi
ADBE | http://www.nasdaq.com/symbol/adbe
AMD | http://www.nasdaq.com/symbol/amd
AAP | https://www.nyse.com/quote/XNYS:AAP
AES | https://www.nyse.com/quote/XNYS:AES
AFL | https://www.nyse.com/quote/XNYS:AFL
A | https://www.nyse.com/quote/XNYS:A
APD | https://www.nyse.com/quote/XNYS:APD
AKAM | http://www.nasdaq.com/symbol/akam
ALK | https://www.nyse.com/quote/XNYS:ALK
ALB | https://www.nyse.com/quote/XNYS:ALB
ARE | https://www.nyse.com/quote/XNYS:ARE
ALGN | http://www.nasdaq.com/symbol/algn
ALLE | https://www.nyse.com/quote/XNYS:ALLE
LNT | https://www.nyse.com/quote/XNYS:LNT
# ... etc ...
It based on my code from answer for: Running for-loop and skipping stocks with 'KeyError' : Date
The same code is also on GitHub
in my repo:
python-examples
/__scraping__
/wikipedia.org - SP500 - requests, BS