No results no errors found on this code from geeksforgeeks.org-CodePudding

Newbee here.

I found this code on geeksforgeeks.org When running this code on VS code. I get not results any idea why? Source: https://www.geeksforgeeks.org/python-program-crawl-web-page-get-frequent-words/


# Python3 program for a word frequency
# counter after crawling/scraping a web-page
import requests
from bs4 import BeautifulSoup
import operator
from collections import Counter

'''Function defining the web-crawler/core
spider, which will fetch information from
a given website, and push the contents to
the second function clean_wordlist()'''


def start(url):

    # empty list to store the contents of
    # the website fetched from our web-crawler
    wordlist = []
    source_code = requests.get(url).text

    # BeautifulSoup object which will
    # ping the requested url for data
    soup = BeautifulSoup(source_code, 'html.parser')

    # Text in given web-page is stored under
    # the <div> tags with class <entry-content>
    for each_text in soup.findAll('div', {'class': 'entry-content'}):
        content = each_text.text

        # use split() to break the sentence into
        # words and convert them into lowercase
        words = content.lower().split()

        for each_word in words:
            wordlist.append(each_word)
        clean_wordlist(wordlist)

# Function removes any unwanted symbols


def clean_wordlist(wordlist):

    clean_list = []
    for word in wordlist:
        symbols = "!@#$%^&*()_- ={[}]|\;:\"<>?/., "

        for i in range(len(symbols)):
            word = word.replace(symbols[i], '')

        if len(word) > 0:
            clean_list.append(word)
    create_dictionary(clean_list)

# Creates a dictionary containing each word's
# count and top_20 occurring words


def create_dictionary(clean_list):
    word_count = {}

    for word in clean_list:
        if word in word_count:
            word_count[word]  = 1
        else:
            word_count[word] = 1

    ''' To get the count of each word in
        the crawled page -->

    # operator.itemgetter() takes one
    # parameter either 1(denotes keys)
    # or 0 (denotes corresponding values)

    for key, value in sorted(word_count.items(),
                    key = operator.itemgetter(1)):
        print ("% s : % s " % (key, value))

    <-- '''

    c = Counter(word_count)

    # returns the most occurring elements
    top = c.most_common(10)
    print(top)


# Driver code
if __name__ == '__main__':
    url = "https://www.geeksforgeeks.org/programming-language-choose/"
    # starts crawling and prints output
    start(url)

I tried to run on the console and in Visual Studio code and I get same issue no results. based on the post, I should get these results. [('to', 10), ('in', 7), ('is', 6), ('language', 6), ('the', 5), ('programming', 5), ('a', 5), ('c', 5), ('you', 5), ('of', 4)]

CodePudding user response：

Open that page in browser, click right, select Inspect. Then click anywhere on the page source code opened on the bottom (or to the right), and select hit Ctrl-F. A search field will appear: type there div//[@class='entry-content'] - and you will see there are no results. Apparently that page' structure changed since they published that tutorial. What you can do is change this line:

for each_text in soup.find_all('div', {'class': 'entry-content'})

to this :

for each_text in soup.find_all('div', {'class': 'text'})

You will get (your) some results, based on those elements content.