Home > Blockchain >  AttributeError only on second iteration of for loop
AttributeError only on second iteration of for loop

Time:10-18

I have a csv file I'm iterating through for links, then taking those links and webscraping with them. A very similar program to this just manually inputting the urls is still working, and the program works fine on the first iteration of the for loop. The problem is that the second iteration grabs the correct url, opens the correct page in chrome, and then gives me an AttributeError for 'header = soup.select('div.zlist.mystories') as follows:

AttributeError: ResultSet object has no attribute 'select'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

Excepts for AttributeError to use find_all(), find(), and select_one() return the same error.

I'm lost, as I need to iterate through this and it works fine in the first iteration, but afterwards it breaks. Any ideas?

import re
from selenium import webdriver
from bs4 import BeautifulSoup as soup
from time import sleep
import csv

filename = 'Author CSV.csv' 
with open(filename, 'r') as csvfile:
       driver = webdriver.Chrome(executable_path='C:/Users/Curious Beats/Downloads/chromedriver.exe')
       datareader = csv.reader(csvfile)

       for row in datareader:
            try: 
                url = row[0]       
                
                driver.get(url)
            
                sleep(1)
                
                soup = soup(driver.page_source, "lxml")
                
                list_header = []
                header = soup.select('div.z-list.mystories')
                for items in header:
                        try:
                            follows = items.get_text().split()
                            list_header.append(follows[follows.index('Follows:') 1])
                        except:
                            
                            continue
                
                
                driver.quit()
                list_header
                list = [re.sub(',', '', _) for _ in list_header]
                for i in range(0, len(list)): 
                    list[i] = int(list[i]) 
                sum = sum(list)
                print(sum)
            except IndexError:
                print('blank')

CodePudding user response:

soup = soup(driver.page_source, "lxml") modifies the meaning of soup. So when this line is executed in the second iteration soup is no more bs4.soup but a soup object. What is strange is that this does not raise an exception.

  • Related