Unable to load tables with "Load more" options in a website using Python-CodePudding

Need to scrape the full table from this site with "Load more" option.

As of now when I`m scraping , I only get the one that shows up by default on when loading the page.

import pandas as pd
import requests
from six.moves import urllib

URL2 = "https://www.mykhel.com/football/indian-super-league-player-stats-l750/"
header = {'Accept-Language': "en-US,en;q=0.9",
          'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
                        "(KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"
          }

resp2 = requests.get(url=URL2, headers=header).text

tables2 = pd.read_html(resp2)
overview_table2= tables2[0]
overview_table2

	Player Name	Team	Matches	Goals	Time Played	Unnamed: 5
0	Jorge Pereyra Diaz	Mumbai City	9	6	538 Mins	NaN
1	Cleiton Silva	SC East Bengal	8	5	707 Mins	NaN
2	Abdenasser El Khayati	Chennaiyin FC	5	4	231 Mins	NaN
3	Lallianzuala Chhangte	Mumbai City	9	4	737 Mins	NaN
4	Nandhakumar Sekar	Odisha	8	4	673 Mins	NaN
5	Ivan Kalyuzhnyi	Kerala Blasters	7	4	428 Mins	NaN
6	Bipin Singh	Mumbai City	9	4	806 Mins	NaN
7	Noah Sadaoui	Goa	8	4	489 Mins	NaN
8	Diego Mauricio	Odisha	8	3	526 Mins	NaN
9	Pedro Martin	Odisha	8	3	263 Mins	NaN
10	Dimitri Petratos	ATK Mohun Bagan	6	3	517 Mins	NaN
11	Petar Sliskovic	Chennaiyin FC	8	3	662 Mins	NaN
12	Holicharan Narzary	Hyderabad	9	3	705 Mins	NaN
13	Dimitrios Diamantakos	Kerala Blasters	7	3	529 Mins	NaN
14	Alberto Noguera	Mumbai City	9	3	371 Mins	NaN
15	Jerry Mawihmingthanga	Odisha	8	3	611 Mins	NaN
16	Hugo Boumous	ATK Mohun Bagan	7	2	580 Mins	NaN
17	Javi Hernandez	Bengaluru	6	2	397 Mins	NaN
18	Borja Herrera	Hyderabad	9	2	314 Mins	NaN
19	Mohammad Yasir	Hyderabad	9	2	777 Mins	NaN
20	Load More....	Load More....	Load More....	Load More....	Load More....	Load More....

But I need the full table , including the data under "Load more", please help.

CodePudding user response：

import requests
import pandas as pd
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0'
}


def main(url):
    params = {
        "action": "stats",
        "league_id": "750",
        "limit": "300",
        "offset": "0",
        "part": "leagues",
        "season_id": "2022",
        "section": "football",
        "stats_type": "player",
        "tab": "overview"
    }
    r = requests.get(url, headers=headers, params=params)
    soup = BeautifulSoup(r.text, 'lxml')
    goal = [(x['title'], *[i.get_text(strip=True) for i in x.find_all_next('td', limit=4)])
            for x in soup.select('a.player_link')]
    df = pd.DataFrame(
        goal, columns=['Name', 'Team', 'Matches', 'Goals', 'Time Played'])
    print(df)


main('https://www.mykhel.com/src/index.php')

Output:

                      Name              Team Matches Goals Time Played
0       Jorge Pereyra Diaz       Mumbai City       9     6    538 Mins
1            Cleiton Silva    SC East Bengal       8     5    707 Mins
2    Abdenasser El Khayati     Chennaiyin FC       5     4    231 Mins
3    Lallianzuala Chhangte       Mumbai City       9     4    737 Mins
4        Nandhakumar Sekar            Odisha       8     4    673 Mins
..                     ...               ...     ...   ...         ...
268          Sarthak Golui    SC East Bengal       6     0    402 Mins
269          Ivan Gonzalez    SC East Bengal       8     0    683 Mins
270       Michael Jakobsen  NorthEast United       8     0    676 Mins
271       Pratik Chowdhary     Jamshedpur FC       6     0    495 Mins
272         Chungnunga Lal    SC East Bengal       8     0    720 Mins

[273 rows x 5 columns]

CodePudding user response：

This is a dynamically loaded page, so you can not parse all the contents without hitting a button.
Well… may be you can with XHR or smth like that, may be someone will contribute to the answers here.

I'll stick to working with dynamically loaded pages with Selenium browser automation suite.

Installation

To get started, you'll need to install selenium bindings:

pip install selenium

You seem to already have beautifulsoup, but for anyone who might come across this answer, we'll also need it and html5lib, we'll need them later to parse the table:

pip install html5lib BeautifulSoup4

Now, for selenium to work you'll need a driver installed for a browser of your choice. To get the drivers you may use Selenium Manager, Driver Management Software or download the drivers manually. The above mentioned options are something new, I have my manually downloaded drivers for ages, so I'll stick to them. I'll duplicate here the download links:

Browser	Link to driver download
Chrome:	https://sites.google.com/chromium.org/driver/
Edge:	https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/
Firefox:	https://github.com/mozilla/geckodriver/releases
Safari:	https://webkit.org/blog/6900/webdriver-support-in-safari-10/
Opera:	https://github.com/operasoftware/operachromiumdriver/releases

You can use any browser, e.g. Brave browser, Yandex Browser, basically any Chromium based browser of your choice or even Tor browser