I want to scrape multiple pages but they will give the result of only the end page these are page link https://www.baroul-cluj.ro/tabloul-avocatilor/avocati-definitivi/
import pandas as pd
for page in range(1,26):
df=pd.read_html('https://www.baroul-cluj.ro/tabloul-avocatilor/avocati-definitivi/?wpv_view_count=9662&wpv_post_search=&wpv_paged={page}'.format(page=page))
df[0].to_csv('tab.csv',index=False)
CodePudding user response:
That's because you always write to the same file, so you will only get the last scrapped data.
A solution to your problem is to create a new file every time like this:
import pandas as pd
for page in range(1,26):
df = pd.read_html('https://www.baroul-cluj.ro/tabloul-avocatilor/avocati-definitivi/?wpv_view_count=9662&wpv_post_search=&wpv_paged={page}'.format(page=page))
df[0].to_csv(f"tab-{page}.csv",index=False)
Or if you want a single file, you can use append mode when writing the CSV file.
import pandas as pd
for page in range(1,26):
df = pd.read_html('https://www.baroul-cluj.ro/tabloul-avocatilor/avocati-definitivi/?wpv_view_count=9662&wpv_post_search=&wpv_paged={page}'.format(page=page))
df[0].to_csv('tab.csv', mode='a', index=False, header=False)
mode="a"
: Use the append mode as opposed tow
– the default write mode.index=False
: Do not include an index column when appending the new data.header=False
: Do not include a header when appending the new data.
NOTE: Be sure that the file exist to use the append mode.