Home > Net >  How to extract text using RSelenium
How to extract text using RSelenium

Time:11-14

I have the following HTML:

 <h3><a href='jobdetail.php?job=705945'>Job Details</a>: lrp1_vs_Hx1sh2</h3><h4>

With this code, I tried to extract the element value lrp1_vs_Hx1sh2

library(RSelenium)
webpage <- "https://cluspro.bu.edu/models.php?job=705945"
browser <- remoteDriver(port = 5556)
browser$open()
browser$navigate(webpage)

clk <- browser$findElement(using = "link text", "Use the server without the benefits of your own account")
clk$clickElement()

jobs <- browser$findElement(using = 'link text', "Job Details")
jobs$getElementText()

But it gives me "Job Details" instead. How can I do it correctly?


Update

THis is the full HTML:

        <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
            <html xmlns="http://www.w3.org/1999/xhtml">
        <head>
        <title>ClusPro 2.0: protein-protein docking</title>
            <meta http-equiv="content-type" content="text/html; charset=utf-8" />
      <link rel='stylesheet' type='text/css' href='/css/style.css' />
      <link rel='stylesheet' type='text/css' href='/css/loginform.css' />
      <link rel='stylesheet' type='text/css' href='/css/signupform.css' />
      <link rel='stylesheet' type='text/css' href='/css/contactform.css' />
      <link rel='stylesheet' type='text/css' href='/css/jobsform.css' />
      <link rel='stylesheet' type='text/css' href='/css/goodform.css' />
      <link rel="stylesheet" type="text/css" href="//cdnjs.cloudflare.com/ajax/libs/yui/2.9.0/grids/grids-min.css" />
      <style type="text/css">#tabResults { font-weight:bold; }</style>      <link rel="shortcut icon" href="/favicon.png" type="image/png" />
      <script type="text/javascript" src="/js/jquery-3.5.1.min.js"></script>
      <script type="text/javascript" src="/js/jquery.equalheights.js"></script>
      
                     <script type="text/javascript">
                  
   var models = {
      reinit: function(){
         var showmodels = $('#showmodels').prop('value');
         $('td:gt(' showmodels ')').hide();
         $('td:eq(' showmodels ')').hide();
         $('td:lt(' showmodels ')').show();
         $('#modelslink').prop('href', 'zipmodels.php?job=705945&coeffi=0&nmodels=' showmodels);
      }
   }
$(document).ready(function(){
   models.reinit();
   $('#showmodels').change(models.reinit);
})
               </script>
    </head>

    <body>
      <div id="doc">
        <div id="hd">
          <ul id='tabs-menu'>
            <li><a id='tabContact' href='/contact.php'>Contact</a></li>
            <li><a id='tabHelp' href='/help.php'>Help</a></li>
            <li><a id='tabPapers' href='/publications.php'>Papers</a></li>
                        <li><a id='tabResults' href='/results.php'>Results</a></li>
            <li><a id='tabQueue' href='/queue.php'>Queue</a></li>
            <li><a id='tabDimer' href='/dimer_predict/submit.php'>Dimer Classification</a></li>
            <li><a id='tabPeptide' href='/peptide/index.php'>Peptide Docking</a></li>
            <li><a id='tabDock' href='/home.php'>Dock</a></li>
          </ul>
          <img src='/image/ClusPro1.png' width='750' height='160' alt=''/>
        </div>
    <div id="bd">
        
          
        <div id='main-header-right'>
          <a href='/logout.php'>sign out</a>
        </div>
       <h3><a href='jobdetail.php?job=705945'>Job Details</a>: lrp1_vs_Hx1sh2</h3><h4><a href='scores.php?job=705945&coeffi=0'>View Model Scores</a></h4><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=0&amp;filetype=model_bz2'>Download all Models for all Coefficients</a><div style="padding-top:1em;">Balanced | <a href='models.php?job=705945&amp;coeffi=2'>Electrostatic-favored</a> | <a href='models.php?job=705945&amp;coeffi=4'>Hydrophobic-favored</a> | <a href='models.php?job=705945&amp;coeffi=6'>VdW Elec</a></div><br /><div>Display Models: <form style='display:inline;'><select id='showmodels'><option value='10'>10</option><option value='15'>15</option><option value='20'>20</option><option value='23'>23</option></select></form></div><br /><a id='modelslink' href=''>Download Displayed Models</a><br /><br /><strong>If you use these models in a paper, please cite our <a href='publications.php'>papers</a></strong><br /><br /><table class='nice' id='models'><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=0&amp;filetype=model_file'>0</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=0&amp;filetype=model_img' alt='0' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=1&amp;filetype=model_file'>1</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=1&amp;filetype=model_img' alt='1' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=2&amp;filetype=model_file'>2</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=2&amp;filetype=model_img' alt='2' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=3&amp;filetype=model_file'>3</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=3&amp;filetype=model_img' alt='3' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=4&amp;filetype=model_file'>4</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=4&amp;filetype=model_img' alt='4' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=5&amp;filetype=model_file'>5</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=5&amp;filetype=model_img' alt='5' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=6&amp;filetype=model_file'>6</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=6&amp;filetype=model_img' alt='6' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=7&amp;filetype=model_file'>7</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=7&amp;filetype=model_img' alt='7' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=8&amp;filetype=model_file'>8</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=8&amp;filetype=model_img' alt='8' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=9&amp;filetype=model_file'>9</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=9&amp;filetype=model_img' alt='9' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=10&amp;filetype=model_file'>10</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=10&amp;filetype=model_img' alt='10' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=11&amp;filetype=model_file'>11</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=11&amp;filetype=model_img' alt='11' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=12&amp;filetype=model_file'>12</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=12&amp;filetype=model_img' alt='12' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=13&amp;filetype=model_file'>13</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=13&amp;filetype=model_img' alt='13' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=14&amp;filetype=model_file'>14</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=14&amp;filetype=model_img' alt='14' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=15&amp;filetype=model_file'>15</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=15&amp;filetype=model_img' alt='15' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=16&amp;filetype=model_file'>16</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=16&amp;filetype=model_img' alt='16' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=17&amp;filetype=model_file'>17</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=17&amp;filetype=model_img' alt='17' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=18&amp;filetype=model_file'>18</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=18&amp;filetype=model_img' alt='18' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=19&amp;filetype=model_file'>19</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=19&amp;filetype=model_img' alt='19' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=20&amp;filetype=model_file'>20</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=20&amp;filetype=model_img' alt='20' /></td><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=21&amp;filetype=model_file'>21</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=21&amp;filetype=model_img' alt='21' /></td></tr><tr><td><a href='file.php?jobid=705945&amp;coeffi=0&amp;model=22&amp;filetype=model_file'>22</a><br /><br /><img src='file.php?jobid=705945&amp;coeffi=0&amp;model=22&amp;filetype=model_img' alt='22' /></td><td>23<br /><br />(image not found)</td></tr></table>        </div>
        <div id="ft">
          ClusPro should only be used for noncommercial purposes.
          <br/>
          <a href='https://www.vajdalab.org' target='_blank'>Vajda Lab</a> and <a href='http://abcgroup.cluspro.org'>ABC Group</a>
          <br/>
          <a href='https://www.bu.edu/'>Boston University</a> and <a href='http://www.stonybrook.edu'>Stony Brook University</a>
        </div>
      </div>

    </body>
  </html>

CodePudding user response:

You can use this xpath

//div[@id='main-header-right']//following-sibling::h3

that should fetch all the text inside that h3 tag.

once you have the text, try to split the text based on space. Then you can choose the appropriate splitted text-based out of the index.

CodePudding user response:

Locating element by link text "Job Details" gives you the a element while you need to get the h3 element text.
Try this instead:

jobs <- browser$findElement(using = 'xpath', "//h3[contains(@href,'jobdetail.php?job=705945')]")
jobs$getElementText()

Or in case the job id can be changing try this:

jobs <- browser$findElement(using = 'xpath', "//h3[contains(@href,'jobdetail.php')]")
jobs$getElementText()
  • Related