Scrape multiple pages through pandas-CodePudding

I want to scrape multiple pages but they will give the result of only the end page these are page link https://www.baroul-cluj.ro/tabloul-avocatilor/avocati-definitivi/

import pandas as pd

for page in range(1,26):
    df=pd.read_html('https://www.baroul-cluj.ro/tabloul-avocatilor/avocati-definitivi/?wpv_view_count=9662&wpv_post_search=&wpv_paged={page}'.format(page=page))
    df[0].to_csv('tab.csv',index=False)

CodePudding user response：

That's because you always write to the same file, so you will only get the last scrapped data.

A solution to your problem is to create a new file every time like this:

import pandas as pd

for page in range(1,26):
    df = pd.read_html('https://www.baroul-cluj.ro/tabloul-avocatilor/avocati-definitivi/?wpv_view_count=9662&wpv_post_search=&wpv_paged={page}'.format(page=page))
    df[0].to_csv(f"tab-{page}.csv",index=False)

Or if you want a single file, you can use append mode when writing the CSV file.

import pandas as pd

for page in range(1,26):
    df = pd.read_html('https://www.baroul-cluj.ro/tabloul-avocatilor/avocati-definitivi/?wpv_view_count=9662&wpv_post_search=&wpv_paged={page}'.format(page=page))
    df[0].to_csv('tab.csv', mode='a', index=False, header=False)

mode="a": Use the append mode as opposed to w – the default write mode.
index=False: Do not include an index column when appending the new data.
header=False: Do not include a header when appending the new data.

NOTE: Be sure that the file exist to use the append mode.