Home > OS >  Web Scraping a list of links from Tripadvisor
Web Scraping a list of links from Tripadvisor

Time:05-16

I'm trying to create a webscraper that will return a list of links to individual objects from the website enter image description here Could someone help me to correct this code so that it would take list of links like below? enter image description here

I will be grateful for any help.

My code:

import requests
from bs4 import BeautifulSoup

header = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36',
}

restaurantLinks = open('pages.csv')
print(restaurantLinks)
urls = [url.strip() for url in restaurantLinks.readlines()]


restlist = []
for link in urls:
    print("Opening link:" str(link))
    response=requests.get(link, headers=header)
    soup = BeautifulSoup(response.text, 'html.parser')

    productlist = soup.find_all('div', class_='cNjlV')
    print(productlist)

    productlinks =[]
    for item in productlist:
        for link in item.find_all('a', href=True):
            productlinks.append('https://www.tripadvisor.com' link['href'])

    print(productlinks)

    restlist.append(productlinks)

print(restlist)

df = pd.DataFrame(restlist)
df.to_csv('links.csv')

CodePudding user response:

Instead of append() elements to your list try to extend() it:

restlist.extend(productlinks)
Example
import requests
from bs4 import BeautifulSoup

header = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36',
}
urls = ['https://www.tripadvisor.com/Attractions-g187427-Activities-oa60-Spain.html']
restlist = []

for link in urls:
    print("Opening link:" str(link))
    response=requests.get(link, headers=header)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    restlist.extend(['https://www.tripadvisor.com' a['href'] for a in soup.select('a:has(h3)')])
    
df = pd.DataFrame(restlist)
df.to_csv('links.csv', index=False)
  • Related