Scrape stock data from subpages of a website without change of address-CodePudding

Based on

With the code above I only get the first subpage. Is there a modification to collect the data of all 17 subpages together?

CodePudding user response：

The data is retrieved via POST request. The tutorial gets you the initial one, but they don't show how to go from there. Look at this and you can see how that works:

import requests
import pandas as pd

url = 'https://www.boerse.de/ajax/table.php'

payload = {
'LISTID': 'US78378X1072@16',
'K_SORTCOLUM': 'N',
'K_SORTORDER': 'asc',
'K_PAGE_SIZE': '30'}

page = 0
results = pd.DataFrame()
while True:
    payload['K_PAGE'] = str(page)
    print('Page: %s' %(page 1))
    response = requests.post(url, data=payload)
    df = pd.read_html(response.text)[0]
    df = df.dropna(subset=['Name'])
    df = df[df['Unnamed: 0'].isnull()].drop(['Unnamed: 0'], axis=1)
    
    if len(df) == 0:
        print('Finished.')
        break
    
    results = results.append(df, sort=False).reset_index(drop=True)
    page =1

Output:

print(results)
                     Name    Bid    Ask   Akt.  ... Tagesvol. Vortag      Zeit Börse
0              3M Company  15625  15660  15643  ...       NaN  15370  14:39:13   LSR
1              A.O. Smith   7186   7300   7243  ...       NaN   7167  14:13:52   LSR
2         Abbott Champion  11175  11215  11195  ...       NaN  11083  14:37:07   LSR
3                  Abbvie  10408  10430  10419  ...       NaN  10260  14:41:41   LSR
4                 Abiomed  29260  29530  29395  ...       NaN  28915  14:41:53   LSR
..                    ...    ...    ...    ...  ...       ...    ...       ...   ...
497  Yum! Brands Champion  11155  11200  11178  ...       NaN  10963  14:53:58   LSR
498    Zebra Technologies  53000  53620  53310  ...       NaN  52330  14:54:01   LSR
499       Zimmer Holdings  11245  11305  11275  ...       NaN  11118  14:50:51   LSR
500  Zions Bancorporation   5700   5750   5725  ...       NaN   5625  14:47:18   LSR
501                Zoetis  19585  19685  19635  ...       NaN  19100  14:53:56   LSR

[502 rows x 13 columns]