Home > Enterprise >  What is the correct approach on making a For Loop with a Cheerio Object?
What is the correct approach on making a For Loop with a Cheerio Object?

Time:03-07

Simply put, I'm scraping data from a website and storing it in a database.

The relevant fields are links, names, prices and item condition.

The way I'm handling this right now is to iterate through each Element and pushing them into their respective lists. Then adding it to a database with a For Loop. So, for example:

var names= [];
$(".midbox .framebox .frameboxcells .displaybox .displayboxbottom .dt.bg0 .serptablecell2-adv .serptablebasestyle2 .valtitle.lovewrap.padr4 .underlinedlinks").each(function(){
            names.push($(this).text());
        });
...
for (x in names){
                var sql = "REPLACE INTO `item` (`link`, `title`, `price`, `date`, `item_condition`, `country`) VALUES (?)";
                var values = [links[x], names[x], prices[x], '', states[x], cc];
            
                con.query(sql, [values], function(err, result){
                    if (err) throw err;
                    });
            }

This is very naive, as it hopes all Elements exist and that they align perfectly, which as worked well so far, until I've noticed some listings on the website I'm scraping do not have an Item Condition element, so it gets skipped and the lists get desynced, resulting in the wrong values being paired up.

I understand the answer I'm looking for has to do with the .each function, but I'm not exactly sure how to go about it. I suppose I have to go the highest point, it being .midbox .framebox .frameboxcells .displaybox .displayboxbottom .dt.bg0 .serptablecell2-adv .serptablebasestyle2 and go from there. Adding a NULL value if it doesn't find an Element.

Below is the full (relevant) code:

const $ = c.load(response.data);

        $(".midbox .framebox .frameboxcells .displaybox .displayboxbottom .dt.bg0 .serptablecell2-adv .serptablebasestyle2 .splittable .splittablecell1 .padr2.bhserp-txt1.bhserp-new1").each(function(){
            var fixedStr = $(this).text().replace(/,|£|\$|\s|[(GBP)]|[(USD)]/g, '');
            prices.push(Number(fixedStr));
        });

        $(".midbox .framebox .frameboxcells .displaybox .displayboxbottom .dt.bg0 .serptablecell2-adv .serptablebasestyle2 .valtitle.lovewrap.padr4 .underlinedlinks").each(function(){
            names.push($(this).text());
        });

        $(".midbox .framebox .frameboxcells .displaybox .displayboxbottom .dt.bg0 .serptablecell2-adv .serptablebasestyle2 .splittable .splittablecell1.bhserp-txt1 .padl1.labinfo").each(function(){
            if ($(this)){
                states.push($(this).text());
            }
            else{
                console.log("Mistake here, pick me up!"); // I understand what I'm doing here does not make sense and is wrong as I've stated, but since that's what made me realize what I needed to do, I'm leaving it.
                states.push("None");
            }
        });

        $(".midbox .framebox .frameboxcells .displaybox .displayboxbottom .dt.bg0 .serptablecell2-adv .serptablebasestyle2 .valtitle.lovewrap.padr4 .underlinedlinks").each(function(){
            var tempLink = $(this).attr('href');
            var fixedLinks = tempLink.split("=");
            var fixedLinks = fixedLinks[1].split("&");
            links.push("https://www.ebay.co.uk/itm/"   fixedLinks[0]);
        });
...
con.connect(function(err){
            if (err) throw err;
            console.log("Connected!");
            for (x in names){
                var sql = "REPLACE INTO `item` (`link`, `title`, `price`, `date`, `item_condition`, `country`) VALUES (?)";
                var values = [links[x], names[x], prices[x], '', states[x], cc];
            
                con.query(sql, [values], function(err, result){
                    if (err) throw err;
                    });
            }
        });

CodePudding user response:

You should iterate the elements. If you try to get prices separately from links you will have a bad experience. Something like:

for(let div of $('.product').get()){
  let item = {
    link: $(div).find('a').attr('href')
    price: $(div).find('.price').text(),
  }
  // insert item into the db
}

CodePudding user response:

pguardiario's answer worked perfectly, I'll leave here the code I ended up with for future reference:

for(let div of $('.midbox .framebox .frameboxcells .displaybox .displayboxbottom .dt.bg0 .serptablecell2-adv .serptablebasestyle2').get()){
        
        var tempLink = $(div).find('.underlinedlinks').attr('href');
        var fixedLinks = tempLink.split("=");
        var fixedLinks = fixedLinks[1].split("&");

        var fixedStr = $(div).find('.padr2.bhserp-txt1.bhserp-new1').text().replace(/,|£|\$|\s|[(GBP)]|[(USD)]/g, '');
        
        let item = {
            link: "https://www.ebay.co.uk/itm/"   fixedLinks[0],
            name: $(div).find('.valtitle.lovewrap.padr4 .underlinedlinks').text(),
            price: Number(fixedStr),
            condition: $(div).find('.padl1.labinfo').text()

        }
}
  • Related