Home > OS >  How to scrape pop-up text using rvest?
How to scrape pop-up text using rvest?

Time:09-24

I'd like to scrape information from the following website: https://www.theglobaleconomy.com/download-data.php

As you will see, there are info boxes associated with each economic variable, such as the one in the picture, it pops up when you click on i: https://i.stack.imgur.com/E3JRy.png

SelectorGadget and the inspection of the code says I should use "#definitionBoxText" as the CSS selector but that doesn't work when I run nodes <- read_html("https://www.theglobaleconomy.com/download-data.php") %>% html_nodes("#definitionBoxText") %>% html_text(), I get nothing in return, just blank. Could you please guide me as to how I can get those information? Any help is greatly appreciated!

CodePudding user response:

It looks like the values for #definitionBoxText are generated when you click on the information icon by a PHP script. That means you won't be able to scrape that text unless you use something like RSelenium and simulate a click on each icon.

An alternative would be opening up the developer tools by pressing F12, going to the Sources tab and saving the file called download-data.php, which contains all the definitions you are looking for. You can then scrape that file separately. Attaching below what the scrapable part looks like:

<div class="indicatorsName">
    Economic growth: the rate of change of real GDP
</div>

<div class="infoIcon">
    <div class="showDefinition"
        style="margin: 4px 3px 0; padding: 1px 6px 0;  border-radius: 10px; border: 1px solid #333; color: #333; float: right; font-weight: bold; font-size;10px">
        i
    </div>
</div>

<div class="clearer"></div>

<div class="definition">
    <b>Economic growth: the rate of change of real GDP</b><br /><br />
    Definition:
    Annual percentage growth rate of GDP at market prices based on constant local currency. Aggregates are based on
    constant 2010 U.S. dollars. GDP is the sum of gross value added by all resident producers in the economy plus any
    product taxes and minus any subsidies not included in the value of the products. It is calculated without making
    deductions for depreciation of fabricated assets or for depletion and degradation of natural resources.
</div>
</div>
  • Related