Home > Software engineering >  I am trying to find link from a website but when i get the link it shows link as "#"
I am trying to find link from a website but when i get the link it shows link as "#"

Time:09-22

Below is the code i am using

for link in f:
    

    r = requests.get(link,verify=True)
    soup = BeautifulSoup(r.content,'html5lib')
    soup.encode('utf-8')
    table = soup.find('div',attrs={'class':'right'})
    print(table.div.a)
    download = table.div.a['href']    

Here instead of getting the download link i am getting a "#"

/d/2qeYvgEb/29682/Horizon - Zero Dawn CE -- fitgirl-repacks.site --.part01.rar

  • Output getting : #

CodePudding user response:

The problem is that the link doesn't exist in the original html, but it changes using an js script. To be able to find the link you need to let js run. I do not think that this is possible using bs4.

Another way to do it is using requests-html:

from requests_html import HTMLSession

session = HTMLSession()

root = 'https://www54.zippyshare.com'
link = 'https://www54.zippyshare.com/v/2Mu2T2KI/file.html'

r = session.get(link)
r.html.render()  # this will load js

download_link = root   r.html.find('.right', first=True).find('a', first=True).attrs['href']

Note that I used a css selector .right; this is the same as class: right, though the latter is not supported on requests-html. The first=True argument does the same job as using find instead of find_all in bs4.

CodePudding user response:

here, all of the link will show in this site

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "https://www54.zippyshare.com/v/2Mu2T2KI/file.html"
r = requests.get(url)
htmlcontent  = r.content
soup = BeautifulSoup(htmlcontent,'html.parser')
#print(soup.prettify)


df = soup.find_all("a")
for data in df:
    dx = data.get('href')
    if dx != "#":
        print(dx)

Hope it will help you

  • Related