I'm trying get the data of three states, based on the same url format.
states = ['123', '124', '125']
urls = []
for state in states:
url = f'www.something.com/geo={state}'
urls.append(url)
and from there I have three separate urls, each containing different state ID.
However when I get to processing it via BS, the output only showed data from the state 123.
for url in urls:
client = ScrapingBeeClient(api_key="API_KEY")
response = client.get(url)
doc = BeautifulSoup(response.text, 'html.parser')
subsequently I extracted the columns I wanted using this:
listings = doc.select('.is-9-desktop')
rows = []
for listing in listings:
row = {}
try:
row['name'] = listing.select_one('.result-title').text.strip()
except:
print("no name")
try:
row['add'] = listing.select_one('.address-text').text.strip()
except:
print("no add")
try:
row['mention'] = listing.select_one('.review-mention-block').text.strip()
except:
pass
rows.append(row)
But as mentioned it only showed data for state 123. Hugely appreciate it if anyone could let me know where I went wrong, thank you!
EDIT
I added the URL output into a list, and was able to get the data for all three states.
doc = []
for url in urls:
client = ScrapingBeeClient(api_key="API_KEY")
response = client.get(url)
docs = BeautifulSoup(response.text, 'html.parser')
doc.append(docs)
However when I ran it through BS it resulted in the error message:
Attribute Error: 'list' object has no attribute select.
Do I run it through another loop?
CodePudding user response:
It do not need all of these loops - Just iterate over the states and get the listings to append to rows.
Most importend thing is, that rows=[]
is placed outside the for loops to stop overwriting itsself.
Example
states = ['123', '124', '125']
rows = []
for state in states:
url = f'www.something.com/geo={states}'
client = ScrapingBeeClient(api_key="API_KEY")
response = client.get(url)
doc = BeautifulSoup(response.text, 'html.parser')
listings = doc.select('.is-9-desktop')
for listing in listings:
row = {}
try:
row['name'] = listing.select_one('.result-title').text.strip()
except:
print("no name")
try:
row['add'] = listing.select_one('.address-text').text.strip()
except:
print("no add")
try:
row['mention'] = listing.select_one('.review-mention-block').text.strip()
except:
pass
rows.append(row)
CodePudding user response:
As noted in the comments there are some errors in your code. Try this version with the changes.
states = ['123', '124', '125']
urls = []
for state in states:
url = f'www.something.com/geo={state}'
urls.append(url)
rows = []
for url in urls:
client = ScrapingBeeClient(api_key="API_KEY")
response = client.get(url)
doc = BeautifulSoup(response.text, 'html.parser')
listings = doc.select('.is-9-desktop')
for listing in listings:
row = {}
try:
row['name'] = listing.select_one('.result-title').text.strip()
except:
print("no name")
try:
row['add'] = listing.select_one('.address-text').text.strip()
except:
print("no add")
try:
row['mention'] = listing.select_one('.review-mention-block').text.strip()
except:
pass
rows.append(row)