Home > Software engineering >  Checking if certain phrases are on a website with python
Checking if certain phrases are on a website with python

Time:11-28

I've written a function that is ment to check if a phrase is in a certain website, however, it is always telling me that it isn't in the website even when it is. I'm relativly new to webscraping so any help would be appreciated.

def check_availability(url,phrase):
    global log
    try:
        # page = urllib.request.urlopen(url)
        r = requests.get(url)
        soup = BeautifulSoup(url, 'html.parser')

        if phrase in soup.text:
            return False
        return True
    except:
        log  = "Error parsing website "

this always returned true for some reason please help.

CodePudding user response:

Modified function:

import requests
from bs4 import BeautifulSoup

def url_contains(url, phrase):
    soup = BeautifulSoup(requests.get(url).content, 'html.parser')
    return phrase in soup.get_text()

Example:

url = 'https://en.wikipedia.org/wiki/Carl_Friedrich_Gauss'

>>> url_contains(url, 'Princeps mathematicorum')
True

>>> url_contains(url, 'foo bar')
False

Slightly optimized:

import requests
from bs4 import BeautifulSoup
from functools import lru_cache

@lru_cache(maxsize=4)
def get_soup(url):
    return BeautifulSoup(requests.get(url).content, 'html.parser')

def url_contains(url, phrase):
    return phrase in get_soup(url).get_text()

This caches the soup obtained from a url, so you can repeatedly query many phrases for a given url. For the above: the first query takes ~1/3s; subsequent queries against that URL take ~4ms.

  • Related