Home > Net >  Word count in python
Word count in python

Time:11-19

I want to calculate the word count of the text taken from the website. I am trying the following code below:

import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen

def get_text(url):
  page = urlopen(url)
  soup = BeautifulSoup(page, "lxml")
  text = ' '.join(map(lambda p: p.text, soup.find_all('p')))
  return soup.title.text, text

number_of_words = 0

url = input('Enter URL - ')
text = get_text(url)

I want to calculate the word count for this text variable

Taking https://www.ibm.com/in-en/cloud/learn/what-is-artificial-intelligence as the URL, everything works well, except for getting the word count of text variable.

P.S. - The word_count count variable entered as a parameter, and the word count of the summary generated differs.

Also I have managed to get the text character length of original text retrieved from URL using the following code

print('Text character length - ', len(str(text)))

CodePudding user response:

len(str(text)) will count letters not words, to count total words you will have to split the text len(str(text).split()):

import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen


def get_text(url):
    page = urlopen(url)
    soup = BeautifulSoup(page, "lxml")
    text = ' '.join(map(lambda p: p.text, soup.find_all('p')))
    return soup.title.text, text


url = input('Enter URL - ')

text = get_text(url)
number_of_words = len(str(text).split())
print(number_of_words)

output:

1080
  • Related