Below is the code i am using
for link in f:
r = requests.get(link,verify=True)
soup = BeautifulSoup(r.content,'html5lib')
soup.encode('utf-8')
table = soup.find('div',attrs={'class':'right'})
print(table.div.a)
download = table.div.a['href']
Here instead of getting the download link i am getting a "#"
link i am scraping : https://www54.zippyshare.com/v/2Mu2T2KI/file.html
Desired output :
/d/2qeYvgEb/29682/Horizon - Zero Dawn CE -- fitgirl-repacks.site --.part01.rar
- Output getting :
#
CodePudding user response:
The problem is that the link doesn't exist in the original html, but it changes using an js script. To be able to find the link you need to let js run. I do not think that this is possible using bs4
.
Another way to do it is using requests-html
:
from requests_html import HTMLSession
session = HTMLSession()
root = 'https://www54.zippyshare.com'
link = 'https://www54.zippyshare.com/v/2Mu2T2KI/file.html'
r = session.get(link)
r.html.render() # this will load js
download_link = root r.html.find('.right', first=True).find('a', first=True).attrs['href']
Note that I used a css selector .right
; this is the same as class
: right
, though the latter is not supported on requests-html
. The first=True
argument does the same job as using find
instead of find_all
in bs4
.
CodePudding user response:
here, all of the link will show in this site
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://www54.zippyshare.com/v/2Mu2T2KI/file.html"
r = requests.get(url)
htmlcontent = r.content
soup = BeautifulSoup(htmlcontent,'html.parser')
#print(soup.prettify)
df = soup.find_all("a")
for data in df:
dx = data.get('href')
if dx != "#":
print(dx)
Hope it will help you