Home > OS >  Can't extract some tags from a web page
Can't extract some tags from a web page

Time:08-10

I was scraping some data from this URL

https://www.degruyter.com/search?query=*&startItem=0&pageSize=10&sortBy=relevance&documentTypeFacet=journal

when I try to get the journal names its not giving anything. Some tags giving response, but tags for journal names gives nothing. div with class name "resultTitle" has journal names but when I try the following in scrapy

response.css("div.resultTitle").get() is giving nothing. I have tried BeautifulSoup also

CodePudding user response:

It seems that the block contains what you want "resultTitle" was loaded by JS which is xxxxxxxx-main.js

...
        a.loginContentPromise.then((()=>{
            const e = document.querySelector("#session-redirect");
            if (e) {
                const t = e.dataset.destination || "/";
                window.location.replace(t)
            }
        }
        )),
...

You can find the code block like below if you post your request via "wget" command, instead of using web browser.

...
    <main id="main" class='language_en px-0 min-vh-100 container-fluid'>
        
    <div id="session-redirect" data-destination='/search?query=*&amp;startItem=0&amp;pageSize=10&amp;sortBy=relevance&amp;documentTypeFacet=journal'></div>

    </main>
...

You can read the "xxxxxxxx-main.js" JS code and implement it. or just simply use Splash to handle it.

P.S.

wget -O search_result.html https://www.degruyter.com/search\?query\=\*\&startItem\=0\&pageSize\=10\&sortBy\=relevance\&documentTypeFacet\=journal 
  • Related