I created a web scraper to view the text on news sights and determine the most popular words being used during a news cycle. It works fine initially, but whenever I refresh, all the data on the rendered page duplicates, and inserts the same entries again into the rendered page
[ "War", 33 ], [ "Ukraine", 30 ], [ "York", 30 ], [ "Brown", 29 ],
Becomes duplicated on refresh
[ [ "War", 66 ], [ "Ukraine", 60 ], [ "York", 60 ], [ "Brown", 58 ],
I think the session data is carrying over with each refresh and the program keeps pushing the same words into my array.
Here's my code
const PORT = process.env.PORT || 8000
const express = require('express')
const axios = require('axios')
const cheerio = require('cheerio')
const { response } = require('express')
const app = express()
const newspapers = [
{
name: 'alternet',
address: 'https://www.alternet.org/',
base: ''
}
]
const wordCount = []
const count = {}
const countArr = []
}
newspapers.forEach(newspaper => {
axios.get(newspaper.address)
.then(response => {
const html = response.data
const $ = cheerio.load(html)
$("a", html).each(function () {
let title = $(this).text()
wordCount.push(titleWords[i])
}
})
})
})
app.get('/', (req, res) => {
res.json('Welcome to my Climate Change News API')
})
app.get('/news', (req, res) => {
theCount(wordCount, count)
sortCount(countArr, count)
res.json(countArr)
})
app.listen(PORT, () => console.log(`server running on PORT ${PORT}`))
CodePudding user response:
Try This
app.get('/news', (req, res) => {
countArr = []
theCount(wordCount, count)
sortCount(countArr, count)
res.json(countArr)
})
The reason why u'r getting duplicates
when you call a GET request to /news everything "outside" the app.method (in here "app.get") has already been called once. (when the api first starts) and wont get called again.
so when you call /news only the code inside that function gets executed and since countArr is not reset to empty array in this function (app.get(/news)) , the program just proceeds to append data to the exisiting array.
Reason this solution works
the countArr is set to empty on each GET /news call
(countArr=[]
)
CodePudding user response:
Ok so I figured out what was going on. my count object and my countArr array were declared at the start of the script. For some reason this was causing these two objects to be added to the database on each refresh of the server. What ultimately solved it was to take those two objects and initialize them in app.get('/news) call which looked like this:
app.get('/news', (req, res) => {
const count = {}
const countArr = []
theCount(wordCount, count)
sortCount(countArr, count)
res.json(countArr)
})