Home > OS >  No results no errors found on this code from geeksforgeeks.org
No results no errors found on this code from geeksforgeeks.org

Time:10-08

Newbee here.

I found this code on geeksforgeeks.org When running this code on VS code. I get not results any idea why? Source: https://www.geeksforgeeks.org/python-program-crawl-web-page-get-frequent-words/


# Python3 program for a word frequency
# counter after crawling/scraping a web-page
import requests
from bs4 import BeautifulSoup
import operator
from collections import Counter

'''Function defining the web-crawler/core
spider, which will fetch information from
a given website, and push the contents to
the second function clean_wordlist()'''


def start(url):

    # empty list to store the contents of
    # the website fetched from our web-crawler
    wordlist = []
    source_code = requests.get(url).text

    # BeautifulSoup object which will
    # ping the requested url for data
    soup = BeautifulSoup(source_code, 'html.parser')

    # Text in given web-page is stored under
    # the <div> tags with class <entry-content>
    for each_text in soup.findAll('div', {'class': 'entry-content'}):
        content = each_text.text

        # use split() to break the sentence into
        # words and convert them into lowercase
        words = content.lower().split()

        for each_word in words:
            wordlist.append(each_word)
        clean_wordlist(wordlist)

# Function removes any unwanted symbols


def clean_wordlist(wordlist):

    clean_list = []
    for word in wordlist:
        symbols = "!@#$%^&*()_- ={[}]|\;:\"<>?/., "

        for i in range(len(symbols)):
            word = word.replace(symbols[i], '')

        if len(word) > 0:
            clean_list.append(word)
    create_dictionary(clean_list)

# Creates a dictionary containing each word's
# count and top_20 occurring words


def create_dictionary(clean_list):
    word_count = {}

    for word in clean_list:
        if word in word_count:
            word_count[word]  = 1
        else:
            word_count[word] = 1

    ''' To get the count of each word in
        the crawled page -->

    # operator.itemgetter() takes one
    # parameter either 1(denotes keys)
    # or 0 (denotes corresponding values)

    for key, value in sorted(word_count.items(),
                    key = operator.itemgetter(1)):
        print ("% s : % s " % (key, value))

    <-- '''

    c = Counter(word_count)

    # returns the most occurring elements
    top = c.most_common(10)
    print(top)


# Driver code
if __name__ == '__main__':
    url = "https://www.geeksforgeeks.org/programming-language-choose/"
    # starts crawling and prints output
    start(url)

I tried to run on the console and in Visual Studio code and I get same issue no results. based on the post, I should get these results. [('to', 10), ('in', 7), ('is', 6), ('language', 6), ('the', 5), ('programming', 5), ('a', 5), ('c', 5), ('you', 5), ('of', 4)]

CodePudding user response:

Open that page in browser, click right, select Inspect. Then click anywhere on the page source code opened on the bottom (or to the right), and select hit Ctrl-F. A search field will appear: type there div//[@class='entry-content'] - and you will see there are no results. Apparently that page' structure changed since they published that tutorial. What you can do is change this line:

for each_text in soup.find_all('div', {'class': 'entry-content'})

to this :

for each_text in soup.find_all('div', {'class': 'text'})

You will get (your) some results, based on those elements content.

  • Related