Home > Software design >  Can I simplify this code to avoid the type error for reading properties?
Can I simplify this code to avoid the type error for reading properties?

Time:05-11

I am writing this code to scrape a webpage. I need to get specific information from the website and there is a lot of information needed to be scraped.

The code that I write works but when do it repeatedly it encounters error on some of the line, e.g. line 20, line 24.

Below is the code

const browser = await puppeteer.launch()
const page = await browser.newPage();

await page.goto("https://startupjobs.asia/job/search?q=&job-list-dpl-page=1", {timeout: 3000000})

const b = (await page.$x("/html/body/div[1]/div[3]/div[1]/div/div[1]/ul/li[1]/div/div[1]/div/h5/a"))[0]
b.click()

//const elm = await page.$('//*[@id="suj-single-jobdetail-wrapper"]/div[1]/div[1]/h5');
//const text = await page.evaluate(elm => elm.textContent, elm[0]);

const [el1] = await page.$x('//*[@id="suj-single-jobdetail-wrapper"]/div[1]/div[1]/h5');
const job_name = await (await el1.getProperty('textContent')).jsonValue();

const [el2] = await page.$x('//*[@id="suj-single-jobdetail-wrapper"]/div[1]/div[2]/div/h6[1]/a');
const company = await (await el2.getProperty('textContent')).jsonValue();

const [el3] = await page.$x('/html/body/div[1]/div[3]/div[2]/div[2]/div[1]/div[2]/div[1]/div[3]/p');
const job_type= await (await el3.getProperty('textContent')).jsonValue();

const [el4] = await page.$x('/html/body/div[1]/div[3]/div[2]/div[2]/div[1]/div[2]/div[1]/div[1]/p');
const salary = await (await el4.getProperty('textContent')).jsonValue();

const [el5] = await page.$x('/html/body/div[1]/div[3]/div[2]/div[2]/div[1]/div[2]/div[1]/div[4]/p');
const skills = await (await el5.getProperty('textContent')).jsonValue();

There are like 13 data I need to scrape.

The error that I got is

const salary = await (await el4.getProperty('textContent')).jsonValue(); TypeError: Cannot read properties of undefined (reading 'getProperty')

CodePudding user response:

The quick fix would be to check if the destructured ElementHandle actually exists before trying to call getProperty on it, for example:

const [el4] = await page.$x('/html/body/div[1]/div[3]/div[2]/div[2]/div[1]/div[2]/div[1]/div[1]/p');
const salary = !el4 ? 'Not Found' : await (await el4.getProperty('textContent')).jsonValue();

A less repetitive script would look more like:

const elementsToFind = [
    { xpath: '//*[@id="suj-single-jobdetail-wrapper"]/div[1]/div[1]/h5', propName: 'job_name' },
    { xpath: '//*[@id="suj-single-jobdetail-wrapper"]/div[1]/div[2]/div/h6[1]/a', propName: 'company' },
    // ...
];
const results = {};
for (const { xpath, propName } of elementsToFind) {
    const [el] = await page.$x(xpath);
    results[propName] = !el ? 'Not Found' : await (await el.getProperty('textContent')).jsonValue();
}

And then iterate through the results object.

  • Related