Home > front end >  How to modify and extend list in python
How to modify and extend list in python

Time:09-24

I'm webscraping for search terms. I want to find suggested terms, and for each suggested term, get a new list of additional suggested terms (which I would like to append to the original list). Below is my code so far

url = ' starting url  '
driver.get(url)

def search(url):
    new_urls = []
    searches = []
    new_searches = []

     
    try:
        #Get suggested terms
        suggestions = driver.find_element_by_xpath('//*[@id="search-associates"]').find_elements_by_tag_name('a')
        for i in suggestions:
            #add each term to the list searches
            searches =[i.text]

            #Get urls of suggested terms to visit
            new_urls  = [i.get_attribute('href')]
        
    except:
        pass
   

    #Filter for duplicate suggested terms and blanks
    new_searches  = [x for x in new_searches if x != None and x not in searches ]
    
    print(new_searches)
    print(searches)
    
    #Visit the first url in the list (just for testing purposes)
    driver.get(new_urls[0])
    
    #call function to get info from new_url
    search(new_urls[0])   
 
       
 #Initial function call
 search(url)

I have the following problems:

  1. The search list refreshes every time the function is called again. The new suggested terms are not appending to the original list.

  2. My code for filtering duplicates and blanks doesn't work

Does anyone know how to fix these problems? If so, I greatly appreciate your help.

Thank you to anyone who took the time to read my problem and help me.

Thank you.

Edit If anyone wants a sample output of what I'm looking for, say I visit a url, and the first set of suggested terms is this:

searches = ['apples', 'bananas', 'cherries', 'grapes', 'oranges']

Say I visit the url for apple and get the following suggested terms:

searches = [ 'bananas', 'cherries', 'tangerines', 'grapefruit']

I want the search list to be updated to:

searches = ['apples', 'bananas', 'cherries', 'grapes', 'oranges','bananas', 'cherries', 'tangerines', 'grapefruit']

I want new_searches to be updated to be

new_searches = ['apples', 'bananas', 'cherries', 'grapes', 'oranges','tangerines', 'grapefruit']

As you can see the duplicates are removed.

The problem I'm getting is that new_searches is not filtering the searches list at all, it's outputting []. The other problem I have is that searches is not appending the new terms to the previous list. It creates a new list instead.

CodePudding user response:

your i.text does not work, so it always directly simply passes. You can add items to lists with:

searches.append(i)

or without the .text. If you want to append only strings you can use str(i).

CodePudding user response:

I would suggest this would be a better approach. create a list outside the loop and keep adding the search result of each iteration to the list regardless of it being duplicate. after the complete list is created convert it to set to get all the unique values and remove duplicates.

  • Related