I have a CSV file with over 3400 URLs from one website and I need to access all these URLs using cloudscraper in order to pull a specific string. For simplicity, let's call the URLs PRODUCT and the string COLOR. Once I retrieve the COLOR of the PRODUCT I want to save it in the row next to the PRODUCT in the CSV file. I've spent several hours going thru helps and looking at other questions regarding scraping and it is escaping me how to do all of this combined together. Here is the code I have so far and I would really appreciate help with how to piece this all together and make it work.
from bs4 import BeautifulSoup
import requests
import csv
import cloudscraper
def scrape(url):
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
color = soup.select('.Property--value').text
with open('scrape.csv', 'w', newline = '') as data:
urls = csv.reader(data, delimiter=',')
scraper = cloudscraper.create_scraper(delay=10, browser='chrome')
info = scraper.get(urls)
for row in urls:
URL_GO = row[2]
scrape(URL_GO)
write = csv.writer(data, delimiter=' ')
write.writerow(color)
Working Code Below:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import cloudscraper
def scrape(url):
scraper = cloudscraper.create_scraper(delay=.10, browser='chrome')
response = scraper.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
div = soup.find("div", {'class':'Property--value'})
content = str(div)
print(content)
return content
df = pd.read_csv('scrape.csv')
for x in df.values:
index = df[df['URL'] == x[0]].index.item()
df.at[index, 'COLOR'] = scrape(x[0])
df.to_csv('new_scrape.csv', index=False)
CodePudding user response:
So if your CSV look like:
URL
link1
link2
link3
....
linkN
We will go through each link, and immediately write the value in the COLOR column
from bs4 import BeautifulSoup
import requests
import pandas as pd
def scrape(url):
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
return soup.select('.Property--value').text
df = pd.read_csv('scrape.csv')
for x in df.values:
index = df[df['URL'] == x[0]].index.item()
df.at[index, 'COLOR'] = scrape(x[0])
df.to_csv('new_scrape.csv', index=False))
New csv:
URL COLOR
link1 value1
link2 value2
link3 value3
linkN valueN