hi im new to python programming. im try to web scraping a news website using python. I got the title and its links. But when i try to save it in excel file it shows value error
Here is Source code and Error
import requests, openpyxl
from bs4 import BeautifulSoup
excel = openpyxl.Workbook()
sheet = excel.active
sheet.title = 'Maalaimalar Links'
sheet.append(['Title','Link'])
req = requests.get("https://www.maalaimalar.com/news/topnews/1")
head_lines = BeautifulSoup(req.text, 'html.parser')
hliness = head_lines.find_all('div', class_ = 'col-md-4 article')
for hlines in hliness:
h2lines = hlines.find('h3').text
link = hlines.find('a')
print(h2lines)
print(link.get('href'))
sheet.append([h2lines, link])
excel.save('maalaimalar.xlsx')
This is the error when i execute with this line
sheet.append([h2lines, link])
ValueError: Cannot convert <a href="https://www.maalaimalar.com/news/topnews/2022/03/06182721/3549285/IPL-2022-Schedule-match-details-for-Chennai-super.vpf"><h3>ஐபிஎல் 2022 அட்டவணை- சென்னை அணி மோதும் ஆட்டங்கள் விவரம்</h3></a> to Excel.
CodePudding user response:
You are trying to push the BeautifulSoup
object to your excel instead extracting the href
as in print(link.get('href'))
:
link = hlines.find('a').get('href')
or
link = hlines.a.get('href')
Example
import requests, openpyxl
from bs4 import BeautifulSoup
excel = openpyxl.Workbook()
sheet = excel.active
sheet.title = 'Maalaimalar Links'
sheet.append(['Title','Link'])
req = requests.get("https://www.maalaimalar.com/news/topnews/1")
head_lines = BeautifulSoup(req.text, 'html.parser')
hliness = head_lines.find_all('div', class_ = 'col-md-4 article')
for hlines in hliness:
h2lines = hlines.find('h3').text
link = hlines.find('a').get('href')
sheet.append([h2lines, link])
excel.save('maalaimalar.xlsx')