I am very new to coding and was trying to get a basic webscraping code to work. The code works just fine, the problem is that I cannot get the CSV file to have any information on it. Any help would be appreciated.
from bs4 import BeautifulSoup
import requests
import csv
page_to_scrape = requests.get("https://www.scrapethissite.com/pages/")
soup = BeautifulSoup(page_to_scrape.text, "html.parser")
descriptions = soup.findAll("p", attrs=("class" == "lead session-desc"))
titles = soup.findAll("h3", attrs=("class" == "page-title"))
with open("scrapeinformation.csv", "w", newline="") as f:
thewriter = csv.writer(f)
for title, desc in zip(titles, descriptions):
print(title.text " - " desc.text)
thewriter.writerow([title.text, desc.text])
f.close()
CodePudding user response:
Are you absolutely sure the csv is empty? When I ran your code, I noticed that the file looked empty when I viewed on Excel, but not if I opened with notepad or Google Sheets, and also that print(title.text " - " desc.text)
shows that the cell entries are surrounded in a lot of whitespace.
So, actually the Excel cells are just showing the whitespace at the beginning because the default format doesn't show more than what fits in the cell. I can see the contents after I:
- Select All by pressing
Ctrl
A
, and then - Toggle the Wrap Text setting by pressing
Alt
H
W
(try toggling once more if there seems to be no difference the first time)
However, the approach I would personally recommend here is to remove the whitespaces in the first place - you can do so by using the strip()
method (like .text.strip()
) or by using .get_text(strip=True)
instead of .text
.