Using Python, my task is to simply take in the html source code from this site - https://www.cboe.com/us/equities/market_statistics/corporate_action/ - and save the first text file in the table named "corporate_action_rpt_20220621.txt" click here for image Right now, I'm able to read this html line, using BeautifulSoup, as shown below from the site's source code:
<a href="2022/06/bzx_equities_corporate_action_rpt_20220621.txt-dl">corporate_action_rpt_20220621.txt</a>
Here is the code I used:
import requests
from bs4 import BeautifulSoup
import os
URL = "https://www.cboe.com/us/equities/market_statistics/corporate_action/"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
table = soup.find('table')
textFileRow = table.tbody.find('tr').find('td').find('a')
print(textFileRow)
How would I open and save the text file from here using Python?
CodePudding user response:
You have to fetch the file using the URL in the href of the a tag you have retrieved, like so:
import requests
from bs4 import BeautifulSoup
import os
URL = "https://www.cboe.com/us/equities/market_statistics/corporate_action/"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
table = soup.find('table')
textFileRow = table.tbody.find('tr').find('td').find('a')
r = requests.get(URL textFileRow['href'])
r.encoding = 'utf-8'
with open("textFile.txt", "w") as text_file:
text_file.write(r.text)