I am getting Error while scraping data from a site please if anyone could help me with that my Code
html = requests.get('https://www.cryptocompare.com/coins/btc/influence/USDT').text
soup = BeautifulSoup(html, 'html.parser')
total_commit = soup.select_one(' # col-body > div > social-influence > div.row.row-zero.influence-others > div:nth-child(2) > div > div > div > div.col-md-3.td-col.brd-right > div > div.repo-tag > span > span > a').text
print(total_commit)
error
soupsieve.util.SelectorSyntaxError: Malformed id selector at position 2
line 1:
# col-body > div > social-influence > div.row.row-zero.influence-others > div:nth-child(2) > div > div > div > div.col-md-3.td-col.brd-right > div > div.repo-tag > span > span > a
^
and also if anyone can tell me how to use the Css selectors which we copy directly from inspect element in bs4.
CodePudding user response:
Try removing space between #
and col-body
.
html = requests.get('https://www.cryptocompare.com/coins/btc/influence/USDT').text
soup = BeautifulSoup(html, 'html.parser')
total_commit = soup.select_one('#col-body > div > social-influence > div.row.row-zero.influence-others > div:nth-child(2) > div > div > div > div.col-md-3.td-col.brd-right > div > div.repo-tag > span > span > a').text
print(total_commit)
But it doesn't work because a part of the html is generated by javascript. So, you need to simulate that you are a web browser (for example with Selenium):
<div class="col-body col-body-new" id="col-body" ui-view>
<div class="loader-ccc">
<div class="loader-ccc-logo"></div>
<div class="loader-ccc-sides"></div>
</div>
In the web browser information exists:
CodePudding user response:
As mentioned by David Miró removing whitespace will fix the error
but to get your goal you have to deal with selenium
Selenium
will render the website and you can inspect the page_source
and select your Element with bs4
:
soup.select_one('div.repo-tag a')['href']
Example
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome('YOUR PATH TO DRIVER')
driver.get('https://www.cryptocompare.com/coins/btc/influence/USDT')
soup=BeautifulSoup(driver.page_source, 'html.parser')
soup.select_one('div.repo-tag a')['href']
Output
https://github.com/bitcoin/bitcoin