Home > Software design >  Reading URLs from CSV - Scraping Data - Saving to Same CSV File
Reading URLs from CSV - Scraping Data - Saving to Same CSV File

Time:07-13

I have a CSV file with over 3400 URLs from one website and I need to access all these URLs using cloudscraper in order to pull a specific string. For simplicity, let's call the URLs PRODUCT and the string COLOR. Once I retrieve the COLOR of the PRODUCT I want to save it in the row next to the PRODUCT in the CSV file. I've spent several hours going thru helps and looking at other questions regarding scraping and it is escaping me how to do all of this combined together. Here is the code I have so far and I would really appreciate help with how to piece this all together and make it work.

from bs4 import BeautifulSoup
import requests
import csv
import cloudscraper

def scrape(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    color = soup.select('.Property--value').text    

with open('scrape.csv', 'w', newline = '') as data:
    urls = csv.reader(data, delimiter=',')
    scraper = cloudscraper.create_scraper(delay=10, browser='chrome') 
    info = scraper.get(urls)    
    for row in urls:
        URL_GO = row[2]
        scrape(URL_GO)
        write = csv.writer(data, delimiter=' ')
        write.writerow(color)

Working Code Below:

from bs4 import BeautifulSoup
import requests
import pandas as pd
import cloudscraper

def scrape(url):
   scraper = cloudscraper.create_scraper(delay=.10, browser='chrome') 
   response = scraper.get(url)    
   soup = BeautifulSoup(response.content, 'html.parser')
   div = soup.find("div", {'class':'Property--value'})
   content = str(div)
   print(content)
   return content

df = pd.read_csv('scrape.csv')
for x in df.values:
   index = df[df['URL'] == x[0]].index.item()
   df.at[index, 'COLOR'] = scrape(x[0])

df.to_csv('new_scrape.csv', index=False)  

CodePudding user response:

So if your CSV look like:

URL
link1
link2
link3
....
linkN

We will go through each link, and immediately write the value in the COLOR column

from bs4 import BeautifulSoup
import requests
import pandas as pd


def scrape(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    return soup.select('.Property--value').text


df = pd.read_csv('scrape.csv')
for x in df.values:
    index = df[df['URL'] == x[0]].index.item()
    df.at[index, 'COLOR'] = scrape(x[0])


df.to_csv('new_scrape.csv', index=False))

New csv:

    URL  COLOR
  link1  value1
  link2  value2
  link3  value3
  linkN  valueN
  • Related