Home > Blockchain >  How to select specific button in puppeteer
How to select specific button in puppeteer

Time:12-23

So I'm building a program that scrapes Poshmark webpages and extracts the usernames of each seller on the page!

I want it to go through every page using the 'next' button, but theres 6 buttons all with the same class name...

Heres the link: https://poshmark.com/category/Men-Jackets_&_Coats?sort_by=like_count&all_size=true&my_size=false

(In my google chrome this page has an infinite scroll (hence the scrollToBottom async function i started writing) but i realized inside puppeteer's chrome it has 'next page' buttons.)

The window displays page 1-5 and then the 'next page' button.

The problem is that all of the buttons share the same html class name, so I'm confused on how to differentiate.

const e = require('express');
const puppeteer = require('puppeteer');
const url = "https://poshmark.com/category/Men-Jackets_&_Coats?sort_by=like_count&all_size=true&my_size=false";
let usernames = [];

 const initItemArea = async (page) => {

    const itemArea = await page.evaluate(() => {
        return Array.from(document.querySelectorAll('.tc--g.m--l--1.ellipses')).map(x => x.textContent);
    });
 }

 const pushToArray =  async (itemArea, page) => {

    itemArea.forEach(function (element) {
        //console.log('username: ', $(element).text());
        usernames.push(element);
    });

 };

 const scrollToBottom = async (itemArea, page) => {

    while (true) {

        previousHeight = await page.evaluate('document.body.scrollHeight');
        await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');
        await page.waitForFunction(`document.body.scrollHeight > ${previousHeight}`);
    
        await new Promise((resolve) => setTimeout(resolve, 1000));

        await page.screenshot({path : "ss.png"})
    }
};


const gotoNextPage = async (page) => {
    await page.waitForSelector(".button.btn.btn--pagination");

    const nextButton = await page.evaluate((page) => {
        document.querySelector(".button.btn.btn--pagination")
    });
    
    await page.click(nextButton);
    console.log('Next Page Loading')

};


async function main() {
 
    const client = await puppeteer.launch({
        headless: false,
        executablePath: "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
    });

    const page = await client.newPage();
    await page.goto(url);
    await page.waitForSelector(".tc--g.m--l--1.ellipses");

    const itemArea = await page.evaluate(() => {
        return Array.from(document.querySelectorAll('.tc--g.m--l--1.ellipses')).map(x => x.textContent);
    });


    gotoNextPage(page)
    
};

main();

Currently, my gotoNextPage function doesnt even find the button, so i thought i'd entered the selector wrong...

Then when I went to find the selector, I realized all buttons have the same one anyway...

My html knowledge is basically nonexistent, but I want to finish this project out. All help is very appreciated.

Bonus: my initPageArea function doesn't work when I call as a function like that, so I hardcoded it into main()...

I'll be diving deep into this problem later on, as I've seen it before, but any quick answers / direction would be awesome.

Thanks a lot.

CodePudding user response:

Whenever you're messing with buttons and scroll, it's a good idea to think about where the data is coming from. It's usually being delivered to the front-end via a JSON API, so you might as well try to hit that API directly rather than mess with the DOM.

const cheerio = require("cheerio"); // 1.0.0-rc.12

const url = maxId => `https://poshmark.com/vm-rest/channel_groups/category/channels/category/collections/post?request={"filters":{"department":"Men","category_v2":"Jackets_&_Coats","inventory_status":["available"]},"sort_by":"like_count","facets":["color","brand","size"],"experience":"all","sizeSystem":"us","max_id":"${maxId}","count":"48"}&summarize=true&pm_version=226.1.0`;

(async () => {
  const usernames = [];

  for (let maxId = 1;; maxId  ) {
    const response = await fetch(url(maxId)); // Node 18 or install node-fetch

    if (!response.ok) {
      throw Error(response.statusText);
    }

    const payload = await response.json();

    if (payload.error) {
      break;
    }

    usernames.push(...payload.data.map(e => e.creator_username));
  }

  console.log(usernames.slice(0, 10));
  console.log("usernames.length", usernames.length);
})()
  .catch(err => console.error(err));

The response blob has a ton of additional data.

I would add a significant delay between requests if I were to use code like this to avoid rate limiting/blocking.

CodePudding user response:

you can try selecting the buttons using their position in the page.

For example, you can select the first button using the following CSS selector:

.button.btn.btn--pagination:nth-child(1)

to select the second button:

.button.btn.btn--pagination:nth-child(2)

Got the idea? :)

you can refactor your gotoNextPage function to use this approach, consider this example:

const gotoNextPage = async (page, buttonIndex) => {      
  await page.waitForSelector(".button.btn.btn--pagination");

  // Select the button using its position in the page
  const nextButton = await page.evaluate((buttonIndex) => {
    return document.querySelector(`.button.btn.btn--pagination:nth-child(${buttonIndex})`);
  }, buttonIndex);

  // Click on the button
  await page.click(nextButton);
  console.log("Next Page Loading");
};
  • Related