(modified after the advice from the comments)
What I'm trying to do: Web scraping to get all 100 cryptocurrency names from this website.
Problem: The selector I used doesn't select all crypto names on the page. Only some of them are selected.
In that webpage, there are 100 rows, and each row has one cryptocurrency name. So 100 names in total.
If I use this selector tr td a[href*="/currencies/"][href$="/"]:not([href$="/markets/"])
,
It finds all 100 elements on the page, and I can get each crypto's URL from these 100 elements. (there are 100 cryptocurrencies on one page)
one sample of the selected 100 elements:
<a href="/currencies/ethereum/" ><div ><img src="https://s2.coinmarketcap.com/static/img/coins/64x64/1027.png" loading="lazy" alt="ETH logo"><div ><p font-weight="semibold" color="text" font-size="1" >Ethereum</p><div ><div >2</div><p color="text3" font-size="1">ETH</p></div></div></div></a>
-No problem so far.
But, if I add >div>div>p
to this selector to get each crypto name,
( tr td a[href*="/currencies/"][href$="/"]:not([href$="/markets/"])>div>div>p
),
only 19 elements are selected, not all 100 elements.
one sample of the selected elements:
<p font-weight="semibold" color="text" font-size="1" >Ethereum</p>
If I scroll down the page, the number of selected elements gets increased, but what I want is to get all 100 elements without wheel scrolling.
- why the first selector finds the whole elements right away, while the second one needs to scroll down the page?
- What selector should I use to select all 100 names of cryptocurrencies without scrolling down?
Thanks in advance :)
(I'm trying web scraping using Playwright, but I think this is just a general question on CSS selector)
CodePudding user response:
This isn't a problem with css selectors. It's a dynamically loaded page so to get its contents you need to make an API call to
https://api.coinmarketcap.com/data-api/v3/map/all?listing_status=active,untracked&exchangeAux=is_active,status&cryptoAux=is_active,status&start=10001&limit=10000
Once you do that, you get back a json. To extract from that the currency code, for example, you need to
const obj = JSON.parse(response);
curs = obj['data']['cryptoCurrencyMap']
if you run
for (let cur of curs)
{console.log(cur['symbol'])}
You should get quite a few codes...