Trying to build my first webscraper to print out how the stock market is doing on Yahoo finance. I have found out how to isolate the information I want but it returns super sloppy. How can I manipulate this data to present in an easier way?
import requests
from bs4 import BeautifulSoup
#Import your website here
html_text = requests.get('https://finance.yahoo.com/').text
soup = BeautifulSoup(html_text, 'lxml')
#Find the part of the webpage where your information is in
sp_market = soup.find('h3', class_ = 'Maw(160px)').text
print(sp_market)
The return here is : S&P 5004,587.18 65.64( 1.45%)
I want to grab these elements such as the labels and percentages and isolate them so I can print them in a way I want. Anyone know how? Thanks so much!
edit:
((S&P 500
4,587.18 65.64( 1.45%)))
CodePudding user response:
For simple splitting you could use the .split(separator) method that is built-in. (f.e. First split by 'x', then split by 'y', then split by 'z' with x, y, z being seperators). Since this is not efficient and if you have bit more complex regular expressions that look the same way for different elements (here: stocks) then take a look at the python regex module.
string = "Stock 45%"
pattern = '[a-z] [0-9][0-9]'
Then, consider to use a function like find_all oder search.
CodePudding user response:
I assume that the format is always S&P 500\n[number][ /-][number]([ /-][number]%)
.
If that is the case, we could do the following.
import re
# [your existing code]
# e.g.
# sp_market = 'S&P 500\n4,587.18 65.64( 1.45%)'
label,line2 = sp_market.split('\n')
pm = re.findall(r"[ -]",line2)
total,change,percent,_ = re.split(r"[\ \-\(\)%] ",line2)
total = float(''.join(total.split(',')))
change = float(change)
if pm[0]=='-':
change=-change
percent = float(percent)
if pm[1]=='-':
percent=-percent
print(label, total,change,percent)
# S&P 500 4587.18 65.64 1.45
CodePudding user response:
Not sure, cause question do not provide an expected result, but you can "isolate" the information with stripped_strings
.
This will give you a list of "isolated" values you can process:
list(soup.find('h3', class_ = 'Maw(160px)').stripped_strings)
#Output
['S&P 500', '4,587.18', ' 65.64', '( 1.45%)']
For example stripping following characters "()%":
[x.strip('\(|\)|%') for x in soup.find('h3', class_ = 'Maw(160px)').stripped_strings]
#Output
['S&P 500', '4,587.18', ' 65.64', ' 1.45']
Simplest way to print the data not that sloppy way, is to join()
the values by whitespace:
' '.join([x.strip('\(|\)|%') for x in soup.find('h3', class_ = 'Maw(160px)').stripped_strings])
#Output
S&P 500 4,587.18 65.64 1.45
You can also create dict()
and print the key / value pairs:
for k, v in dict(zip(['Symbol','Last Price','Change','% Change'], [x.strip('\(|\)|%') for x in soup.find('h3', class_ = 'Maw(160px)').stripped_strings])).items():
print(f'{k}: {v}')
#Output
Symbol: S&P 500
Last Price: 4,587.18
Change: 65.64
% Change: 1.45