if string contains from list-CodePudding

I want to check if any of the excluded sites show up. I can get it to work with just one site, but as soon as I make it a list, it errors at if donts in thingy:

TypeError: 'in ' requires string as left operand, not tuple"

This is my code:

import requests 
from bs4 import BeautifulSoup
from lxml import html, etree
import sys
import re

url = ("http://stackoverflow.com")

donts = ('stackoverflow.com', 'stackexchange.com')

r = requests.get(url, timeout=6, verify=True)

soup = BeautifulSoup(r.content, 'html.parser')

for link in soup.select('a[href*="http"]'):

    thingy = (link.get('href'))

    thingy = str(thingy)

    if donts in thingy:

        pass

    else:

        print (thingy)

CodePudding user response：

import requests 

from bs4 import BeautifulSoup

from lxml import html, etree

import sys

import re

url = ("http://stackoverflow.com")

donts = ('stackoverflow.com', 'stackexchange.com')

r = requests.get(url, timeout=6, verify=True)

soup = BeautifulSoup(r.content, 'html.parser')

for link in soup.select('a[href*="http"]'):

    thingy = (link.get('href'))

    thingy = str(thingy)

    if thingy in donts :

        print (thingy)

    else:

        pass

Judge: string in tuple

CodePudding user response：

The crux of your problem is how you're searching your excluded list:

excluded = ("a", "b", "c")
links = ["a", "d", "e"]

for site in links:
    if site not in excluded:  # We want to know if the site is in the excluded list
        print(f"Site not excluded: {site}")

Reverse the order of your elements and this should work fine. I've inverted your logic here so you can skip the unnecessary pass.

As a side note, this is one reason clear variable names can help - they will help you reason about what the logic should be doing. Especially in Python where ergonomics like in exist, this is very useful.

CodePudding user response：

import requests 

from bs4 import BeautifulSoup

from lxml import html, etree

import sys

import re

url = ("http://stackoverflow.com")

donts = ('stackoverflow.com', 'stackexchange.com')

r = requests.get(url, timeout=6, verify=True)

soup = BeautifulSoup(r.content, 'html.parser')

for link in soup.select('a[href*="http"]'):

    thingy = (link.get('href'))

    thingy = str(thingy)

    if any(d in thingy for d in donts):

        pass

    else:

        print (thingy)