I am creating an API using Puppeteer. The goal is to get data from football games to create a mobile app.
I made a script using Puppeteer. It's working and gets the data that I want to. The problem is that I want to get the data of all games in the page, but it only returns the data of one game.
The site that I am using to request is https://www.flashscore.com.br
.
This is the service file:
import puppeteer from "puppeteer";
class NextGamesService {
async execute() {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://www.flashscore.com.br');
const games = await page.$$('#live-table > section > div > div');
const game1 = []
for (const game of games) {
const time = await page.evaluate((el) => el.querySelector('#g_1_IcYs8jIk > div.event__time')?.textContent, game)
const home = await page.evaluate((el) => el.querySelector('div.event__participant.event__participant--home')?.textContent, game)
const away = await page.evaluate((el) => el.querySelector('div.event__participant.event__participant--away')?.textContent, game)
const league = await page.evaluate((el) => el.querySelector('div.icon--flag.event__title.fl_81 > div > span.event__title--name')?.textContent, game)
game1.push({ time, home, away, league});
}
return ({game1})
}
}
export { NextGamesService }
This is the controller:
import { NextGamesService } from "./nextGamesService";
import { Request, Response } from "express";
class NextGamesController {
async handle(req: Request, res: Response) {
const nextGamesService =new NextGamesService();
const games = await nextGamesService.execute()
return res.json(games)
}
}
export {NextGamesController}
The JSON response I get:
{
"game1": [
{
"time": "11:30",
"home": "Dortmund",
"away": "Augsburg",
"league": "Bundesliga"
}
]
}
CodePudding user response:
Your selector grabs the container for the events, not the events themselves. .event__match
is the container for a game.
Some events don't have times, for example, if they're live currently, so you can replace those with .event__stage
if you want.
Since the page load is slow, I'm blocking some resources to improve speed a bit.
Don't forget to handle errors and close your browser properly to avoid a memory leak. execute
should probably be a static method, but I usually avoid classes and abstractions with Puppeteer until I have working code.
const puppeteer = require("puppeteer"); // ^19.1.0
const url = "<Your URL>";
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setRequestInterception(true);
const blocked = ["image", "font", "stylesheet"];
page.on("request", req => {
if (!req.url().includes("flashscore") ||
blocked.includes(req.resourceType())) {
req.abort();
}
else {
req.continue();
}
});
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.waitForSelector(".event__time");
const events = await page.$$eval(".event__match", els =>
els.map(e => {
const text = x => e.querySelector(x)?.textContent.trim();
return {
time: text(".event__time") /*|| text(".event__stage")*/,
home: text(".event__participant--home"),
away: text(".event__participant--away"),
league: text(".event__title--name")
};
})
);
console.log(events);
console.log(events.length);
})()
.catch(err => console.error(err))
.finally(() => browser?.close());