I'm webscraping for search terms. I want to find suggested terms, and for each suggested term, get a new list of additional suggested terms (which I would like to append to the original list). Below is my code so far
url = ' starting url '
driver.get(url)
def search(url):
new_urls = []
searches = []
new_searches = []
try:
#Get suggested terms
suggestions = driver.find_element_by_xpath('//*[@id="search-associates"]').find_elements_by_tag_name('a')
for i in suggestions:
#add each term to the list searches
searches =[i.text]
#Get urls of suggested terms to visit
new_urls = [i.get_attribute('href')]
except:
pass
#Filter for duplicate suggested terms and blanks
new_searches = [x for x in new_searches if x != None and x not in searches ]
print(new_searches)
print(searches)
#Visit the first url in the list (just for testing purposes)
driver.get(new_urls[0])
#call function to get info from new_url
search(new_urls[0])
#Initial function call
search(url)
I have the following problems:
The search list refreshes every time the function is called again. The new suggested terms are not appending to the original list.
My code for filtering duplicates and blanks doesn't work
Does anyone know how to fix these problems? If so, I greatly appreciate your help.
Thank you to anyone who took the time to read my problem and help me.
Thank you.
Edit If anyone wants a sample output of what I'm looking for, say I visit a url, and the first set of suggested terms is this:
searches = ['apples', 'bananas', 'cherries', 'grapes', 'oranges']
Say I visit the url for apple
and get the following suggested terms:
searches = [ 'bananas', 'cherries', 'tangerines', 'grapefruit']
I want the search list to be updated to:
searches = ['apples', 'bananas', 'cherries', 'grapes', 'oranges','bananas', 'cherries', 'tangerines', 'grapefruit']
I want new_searches to be updated to be
new_searches = ['apples', 'bananas', 'cherries', 'grapes', 'oranges','tangerines', 'grapefruit']
As you can see the duplicates are removed.
The problem I'm getting is that new_searches
is not filtering the searches
list at all, it's outputting []
. The other problem I have is that searches
is not appending the new terms to the previous list. It creates a new list instead.
CodePudding user response:
your i.text
does not work, so it always directly simply passes.
You can add items to lists with:
searches.append(i)
or without the .text
. If you want to append only strings you can use str(i)
.
CodePudding user response:
I would suggest this would be a better approach. create a list outside the loop and keep adding the search result of each iteration to the list regardless of it being duplicate. after the complete list is created convert it to set to get all the unique values and remove duplicates.