Home > front end >  if string contains from list
if string contains from list

Time:09-30

I want to check if any of the excluded sites show up. I can get it to work with just one site, but as soon as I make it a list, it errors at if donts in thingy:

TypeError: 'in ' requires string as left operand, not tuple"

This is my code:

import requests 
from bs4 import BeautifulSoup
from lxml import html, etree
import sys
import re

url = ("http://stackoverflow.com")

donts = ('stackoverflow.com', 'stackexchange.com')

r = requests.get(url, timeout=6, verify=True)

soup = BeautifulSoup(r.content, 'html.parser')

for link in soup.select('a[href*="http"]'):

    thingy = (link.get('href'))

    thingy = str(thingy)

    if donts in thingy:

        pass

    else:

        print (thingy)

CodePudding user response:

import requests 

from bs4 import BeautifulSoup

from lxml import html, etree

import sys

import re

url = ("http://stackoverflow.com")

donts = ('stackoverflow.com', 'stackexchange.com')

r = requests.get(url, timeout=6, verify=True)

soup = BeautifulSoup(r.content, 'html.parser')

for link in soup.select('a[href*="http"]'):

    thingy = (link.get('href'))

    thingy = str(thingy)

    if thingy in donts :

        print (thingy)

    else:

        pass

Judge: string in tuple

CodePudding user response:

The crux of your problem is how you're searching your excluded list:

excluded = ("a", "b", "c")
links = ["a", "d", "e"]

for site in links:
    if site not in excluded:  # We want to know if the site is in the excluded list
        print(f"Site not excluded: {site}")

Reverse the order of your elements and this should work fine. I've inverted your logic here so you can skip the unnecessary pass.

As a side note, this is one reason clear variable names can help - they will help you reason about what the logic should be doing. Especially in Python where ergonomics like in exist, this is very useful.

CodePudding user response:

import requests 

from bs4 import BeautifulSoup

from lxml import html, etree

import sys

import re

url = ("http://stackoverflow.com")

donts = ('stackoverflow.com', 'stackexchange.com')

r = requests.get(url, timeout=6, verify=True)

soup = BeautifulSoup(r.content, 'html.parser')

for link in soup.select('a[href*="http"]'):

    thingy = (link.get('href'))

    thingy = str(thingy)

    if any(d in thingy for d in donts):

        pass

    else:

        print (thingy)
  • Related