Home > front end >  Why BeautifulSoup returning same information over and over again
Why BeautifulSoup returning same information over and over again

Time:12-15

When I am trying to scrap website over multiple pages BeautifulSoup returning the 1st page content for all the page range.. It is getting repeated again and again..

data=pd.DataFrame()
for i in range(1,10):
  headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
  url="https://www.collegesearch.in/engineering-colleges-india".format(i)

  r = requests.get(url, headers=headers)
  soup = BeautifulSoup(r.content, 'html5lib') 
    
  #clg url and name
  clg=soup.find_all('h2', class_='media-heading mg-0')

   #other details
  details=soup.find_all('dl', class_='dl-horizontal mg-0')

  _dict={'clg':clg,'details':details}

  df=pd.DataFrame(_dict)

  data=data.append(df,ignore_index=True)

CodePudding user response:

It is not an issue of BeautifulSoup - Check your loop, you never change the page, cause url is always the same:

https://www.collegesearch.in/engineering-colleges-india

So change your code and set your counter as value of page parameter:

for i in range(1,10):
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
    url=f"https://www.collegesearch.in/engineering-colleges-india?page={i}"
    print(url)

May also take a short read: https://docs.python.org/3/tutorial/inputoutput.html

  • Related