I am attempting to figure out a regular expression that will display the headlines from a news feed of a stock.
This is the code I have so far, with the special characters of the regular expression being "<title.*?</":
def yahoo_hl(ticker):
import re, requests
headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0"}
xml = requests.get(f'https://feeds.finance.yahoo.com/rss/2.0/headline?s={ticker}', headers=headers).text
news_headlines = re.findall(r'<title.*?</', xml, re.DOTALL) # put your regular expression between the single quotes
return news_headlines
When I run it, it displays the following output with the headlines showing in addition to "< title >" and the "< /" characters at the beginning and end of each headline:
['<title>Yahoo! Finance: TSLA News</',
'<title>Tesla Is About to Start Production at Its Berlin Gigafactory</',
'<title>Tesla CEO Elon Musk Wants the U.S. and the World to Pump More Oil</',
'<title>Tesla Gets Stronger With Oil Rising, Other EV Stocks Not So Much</',
'<title>What Is The Boring Company?</']
The goal is to remove the "< title >" and "<" to output the headlines like this:
['Yahoo! Finance: TSLA News',
'Tesla Is About to Start Production at Its Berlin Gigafactory',
'Tesla CEO Elon Musk Wants the U.S. and the World to Pump More Oil',
'Tesla Gets Stronger With Oil Rising, Other EV Stocks Not So Much',
'What Is The Boring Company?']
Any help would be appreciated. Thank you in advance.
CodePudding user response:
You can make a "capturing group" in the regex:
import re, requests
def yahoo_hl(ticker):
headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0"}
xml = requests.get(f'https://feeds.finance.yahoo.com/rss/2.0/headline?s={ticker}', headers=headers).text
news_headlines = re.findall(r'<title>(.*?)</title', xml, re.DOTALL)
return news_headlines
print(*yahoo_hl('TSLA'), sep='\n') # yahoo_hl('TSLA') is the list you want
Output:
Yahoo! Finance: TSLA News
Tesla Is About to Start Production at Its Berlin Gigafactory
Tesla CEO Elon Musk Wants the U.S. and the World to Pump More Oil
Tesla Gets Stronger With Oil Rising, Other EV Stocks Not So Much
What Is The Boring Company?
...
You can find the relevant information in the doc:
The result depends on the number of capturing groups in the pattern. If there are no groups, return a list of strings matching the whole pattern. If there is exactly one group, return a list of strings matching that group.