Home > Back-end >  Missing data using Beautiful soup
Missing data using Beautiful soup

Time:09-16

I'm trying to get the university names, scores and country names from this website: https://roundranking.com/ranking/world-university-rankings.html#world-2021 I can find the table where the data is by class, but the data which is in the <tbody> part of table is just disappears when I try to find it with Beautiful soup.

Here is the original html code:

<table  style="padding: 0px;">
<thead >
<tr><th >Rank</th><th  style="background-color: rgb(198, 235, 178);">University</th><th >Score</th><th >Country</th><th >Flag</th><th >League</th></tr>
</thead><thead  style="display: none; opacity: 0;">
<tr><th >Rank</th><th  style="background-color: rgb(198, 235, 178);">University</th><th >Score</th><th >Country</th><th >Flag</th><th >League</th></tr>
</thead>
<tbody>
<tr ><td >1</td><td ><a href="/universities/harvard-university.html?sort=O&amp;year=2021&amp;subject=SO">Harvard University</a></td><td >100.000</td><td >USA</td><td ><img src="../images_rur/Flag/Flag_USA.png" alt=""></td><td >Diamond League</td>
...
</tbody>
</table>

And here is the html what the soup shows:

<table  style="padding: 0px;">
<thead >
<tr><th >Rank</th><th  style="background-color: rgb(198, 235, 178);">University</th><th >Score</th><th >Country</th><th >Flag</th><th >League</th></tr>
</thead><thead  style="display: none; opacity: 0;">
<tr><th >Rank</th><th  style="background-color: rgb(198, 235, 178);">University</th><th >Score</th><th >Country</th><th >Flag</th><th >League</th></tr>
</thead>
</table>

My python code trying to get tha data:

import selenium
from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome('./chromedriver.exe')
driver.get('https://roundranking.com/ranking/world-university-rankings.html#world-2021')

source = driver.page_source
soup=BeautifulSoup(source)
#soup = BeautifulSoup(source, 'html5lib')
#soup = BeautifulSoup(source, 'html.parser')
#soup = BeautifulSoup(source, 'lxml')

soup.prettify

table=soup.find('table', {'class':'big-table table-sortable uci'})
print(table)

I've tried html5lib, lxml and html.parser but nothing works, when I print out the table it does not contain the body part, which has the data I need.

CodePudding user response:

the table is generated by a java script, you can find the required query in the browser. here is an example

url = "https://roundranking.com/final/ranking-json18r.php"

payload = "t=2021&s=O&sa=SO&sc=All Countries"
response = requests.request("POST", url, data=payload)
for university in response.json():
    print(university['rank'], university['univ'], university['score'], university['economy'], university['league'])

OUTPUT:

1 Harvard University 100.0 USA Diamond League
2 California Institute of Technology (Caltech) 98.137 USA Diamond League
3 Imperial College London 97.706 UK Diamond League
4 Stanford University 97.604 USA Diamond League
5 Yale University 97.506 USA Diamond League
6 Massachusetts Institute of Technology (MIT) 97.364 USA Diamond League
7 ETH Zurich (Swiss Federal Institute of Technology) 96.187 Switzerland Diamond League
8 Columbia University 95.393 USA Diamond League
9 University of Cambridge 95.258 UK Diamond League
10 University of Oxford 94.989 UK Diamond League
11 University of Chicago 94.712 USA Diamond League
12 Karolinska Institute 94.642 Sweden Diamond League
13 Johns Hopkins University 94.299 USA Diamond League
14 University College London 94.172 UK Diamond League
15 Northwestern University 94.117 USA Diamond League
16 Princeton University 93.993 USA Diamond League
17 Ecole Polytechnique Federale de Lausanne 93.75 Switzerland Diamond League
18 University of Pennsylvania 93.525 USA Diamond League
19 Cornell University 92.271 USA Diamond League
20 Washington University in St. Louis 91.325 USA Diamond League
21 Carnegie Mellon University 90.608 USA Diamond League
22 Scuola Normale Superiore di Pisa 90.345 Italy Diamond League
23 Case Western Reserve University 90.314 USA Diamond League
24 University of Michigan 89.447 USA Diamond League
25 Boston University 89.443 USA Diamond League
26 Brown University 89.043 USA Diamond League
27 Technical University of Denmark 88.842 Denmark Diamond League
...
  • Related