Home > Back-end >  Getting error while fetching details with web scrapping in python
Getting error while fetching details with web scrapping in python

Time:10-27

I am getting Error while scraping data from a site please if anyone could help me with that my Code

html = requests.get('https://www.cryptocompare.com/coins/btc/influence/USDT').text
soup = BeautifulSoup(html, 'html.parser')
total_commit = soup.select_one('  # col-body > div > social-influence > div.row.row-zero.influence-others > div:nth-child(2) > div > div > div > div.col-md-3.td-col.brd-right > div > div.repo-tag > span > span > a').text
print(total_commit)

error

soupsieve.util.SelectorSyntaxError: Malformed id selector at position 2
  line 1:
  # col-body > div > social-influence > div.row.row-zero.influence-others > div:nth-child(2) > div > div > div > div.col-md-3.td-col.brd-right > div > div.repo-tag > span > span > a
  ^

and also if anyone can tell me how to use the Css selectors which we copy directly from inspect element in bs4.

CodePudding user response:

Try removing space between # and col-body.

html = requests.get('https://www.cryptocompare.com/coins/btc/influence/USDT').text
soup = BeautifulSoup(html, 'html.parser')
total_commit = soup.select_one('#col-body > div > social-influence > div.row.row-zero.influence-others > div:nth-child(2) > div > div > div > div.col-md-3.td-col.brd-right > div > div.repo-tag > span > span > a').text
print(total_commit)

But it doesn't work because a part of the html is generated by javascript. So, you need to simulate that you are a web browser (for example with Selenium):

 <div class="col-body col-body-new" id="col-body" ui-view>
        <div class="loader-ccc">
    <div class="loader-ccc-logo"></div>
    <div class="loader-ccc-sides"></div>
 </div>

In the web browser information exists:

from web browser

CodePudding user response:

As mentioned by David Miró removing whitespace will fix the error but to get your goal you have to deal with selenium

Selenium will render the website and you can inspect the page_source and select your Element with bs4:

soup.select_one('div.repo-tag a')['href']

Example

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome('YOUR PATH TO DRIVER')
driver.get('https://www.cryptocompare.com/coins/btc/influence/USDT')

soup=BeautifulSoup(driver.page_source, 'html.parser')

soup.select_one('div.repo-tag a')['href']

Output

https://github.com/bitcoin/bitcoin
  • Related