According to the Playwright documentation, the way to open a new tab in the browser is as shown in the scrap_post_info()
function? However, it failed to do so.
What I am currently trying to do now is to loop through each URL within the posts
list variable and then open up the link or URL in the new tab to scrap the post details. After done scraping a post, the tab then will be closed and continue to open up the next link in a new tab to scrap the post details again until it reaches the last URL in the posts
list variable.
# Loop through each URL from the `posts` list variable that contains many posts' URLs
for post in posts:
scrap_post_info(context, post)
def scrap_post_info(context, post):
with context.expect_page() as new_page_info:
page.click('a[target="_blank"]') # Opens a new tab
new_page = new_page_info.value
new_page.wait_for_load_state()
print(new_page.title())
CodePudding user response:
Doing something similar for a project of mine, this is how I would do it.
from playwright.sync_api import sync_playwright
posts = ['https://playwright.dev/','https://playwright.dev/python/',]
def scrap_post_info(context, post):
page = context.new_page()
page.goto(post)
print(page.title())
# do whatever scraping you need to
page.close()
with sync_playwright() as p:
browser = p.chromium.launch()
context = browser.new_context()
for post in posts:
scrap_post_info(context, post)
# some time delay
browser.close()
Thing is the code snippet from the playwright docs is more about opening a new page after clicking a link on an existing page. Since you already have the urls ready, you can just visit each page one by one, and do your scraping.