Home > front end >  Parameter passing in function
Parameter passing in function

Time:01-22

Seems like this has been addressed over the years that Python has been around, but here goes anyway:

def soupstrainer(tag_element,srch_str):
    ''' take a soup element
        return a list of found items
    '''
    results=[]
    ###literal search string returns results, even though two lines down,
    ### print(srch_str) returns the expected string
    souper=tag_element.find_all('a',{'data-tn-element':'companyName'}) #srch_str)
    print(srch_str)
    for r in souper:
        if r != None:
            results.append(r.get_text(r.string, strip=True))
    return results

with open('scrapesnip.html', 'r') as the_file:
    doc4 = the_file.read()

soup = BeautifulSoup(doc4, 'html.parser')
result = soupstrainer(soup,str("'a',{'data-tn-element':'companyName'}"))
print(result,len(result))

Results:

## zero results passing the string
/PYscripts/bravosierra4.py
'a',{'data-tn-element':'companyName'}        <=== these two strings *look* identical
[] 0

## with the identical string
## 'hard coded' into the function
/PYscripts/bravosierra4.py
'a',{'data-tn-element':'companyName'}         <=== these two strings *look identical
['Keysight Technologies', 'ECS Federal LLC', 'Corsica Technologies, LLC', 'Caribou', 'Collins Aerospace', 'Travelers', 'CyberCoders', 'HealthVerity', 'Circadence Corporation'] 9

Am I passing srch_string incorrectly?

CodePudding user response:

I'm not sure how exactly you're passing srch_string, but this:

souper = tag_element.find_all('a', {'data-tn-element': 'companyName'})

is not the same as this:

srch_string = "'a', {'data-tn-element': 'companyName'}"
souper = tag_element.find_all(srch_string)

In the first case, you're passing a string and a dict as separate arguments. In the second case, you're passing a single string. Code that you put into a string variable doesn't get evaluated as code inside of other expressions (and it would be a really big problem if it did).

You could do this instead:

def soupstrainer(tag_element, *srch_args):
    """take a soup element and search args, return a list of found items"""
    souper = tag_element.find_all(*srch_args)
    return [r.get_text(r.string, strip=True) for r in souper if r is not None]

...
result = soupstrainer(soup, ,'a', {'data-tn-element': 'companyName'})

so that soupstrainer just takes the search arguments as separate arguments (instead of packing them into a single string) and passes them straight along to find_all.

CodePudding user response:

Looks as though BeautifulSoup is parsing the find_all as a natural list, so when it shows up as a string, nothing is returned. Here's the way I successfully coded the line calling the function:

result = soupstrainer(soup,['a',{'data-tn-element':'companyName'}])

and recoded the function like this:

def soupstrainer(tag_element,srch_str):
    ''' take a soup element 
        return a list of found items
    '''
    results=[]
    souper=tag_element.find_all(srch_str[0],srch_str[1])
  •  Tags:  
  • Related