Home > Blockchain >  I am trying to scrape the table from iplt20 website, it keep returning blank []
I am trying to scrape the table from iplt20 website, it keep returning blank []

Time:07-04

from bs4 import BeautifulSoup

import requests

url = 'https://www.iplt20.com/stats/2021/most-runs'

source = requests.get(url)

soup = BeautifulSoup(source.text, 'html.parser')

soup.find_all('table', class_ ='np-mostruns_table')

CodePudding user response:

Probably that's because the page is loaded via javascript. I saw people use mechanical soup instead. https://mechanicalsoup.readthedocs.io/en/stable/tutorial.html

CodePudding user response:

The website is fully javascript, you can't load javascript with requests.

You have to use an automated browser like selenium or similar.

I also suggest using an extension when you are scraping to disable javascript (toggle on/off) like this

Toggle JS

CodePudding user response:

If you are looking to find a table with class, you should use:

soup.find("table",{"class":"np-mostruns_table"})

CodePudding user response:

You can't get the table because it's loaded dynamically. You need to find the query that loads it, and build your table from it. It has many more fields than shown on the site, so you can add additional fields that you need. I gave an example only with those fields that are on the site

import requests
import json
import pandas as pd


url = 'https://ipl-stats-sports-mechanic.s3.ap-south-1.amazonaws.com/ipl/feeds/stats/60-toprunsscorers.js?callback=ontoprunsscorers'
results = []
response = requests.get(url)
json_data = json.loads(response.text[response.text.find('(') 1:response.text.find(')')])
for player in json_data['toprunsscorers']:
    data = {
        'Player': player['StrikerName'],
        'Mat': player['Matches'],
        'Inns': player['Innings'],
        'NO': player['NotOuts'],
        'Runs': player['TotalRuns'],
        'HS': player['HighestScore'],
        'AVG': player['BattingAverage'],
        'BF': player['Balls'],
        'SR': player['StrikeRate'],
        '100': player['Centuries'],
        '50': player['FiftyPlusRuns'],
        '4s': player['Fours'],
        '6s': player['Sixes']
    }
    results.append(data)
df = pd.DataFrame(results)
print(df)

OUTPUT:

                  Player Mat Inns NO Runs    HS  ...   BF      SR 100 50  4s  6s
0            Jos Buttler  17   17  2  863   116  ...  579  149.05   4  4  83  45
1              K L Rahul  15   15  3  616  103*  ...  455  135.38   2  4  45  30
2        Quinton De Kock  15   15  1  508  140*  ...  341  148.97   1  3  47  23
3          Hardik Pandya  15   15  4  487   87*  ...  371  131.26   0  4  49  12
4           Shubman Gill  16   16  2  483    96  ...  365  132.32   0  4  51  11
..                   ...  ..  ... ..  ...   ...  ...  ...     ...  .. ..  ..  ..
157     Fazalhaq Farooqi   3    1  1    2    2*  ...    8   25.00   0  0   0   0
158   Jagadeesha Suchith   5    2  0    2     2  ...    8   25.00   0  0   0   0
159          Tim Southee   9    5  1    2    1*  ...   12   16.66   0  0   0   0
160  Nathan Coulter-Nile   1    1  1    1    1*  ...    2   50.00   0  0   0   0
161        Anrich Nortje   6    1  1    1    1*  ...    6   16.66   0  0   0   0
  • Related