I want to calculate the word count of the text taken from the website. I am trying the following code below:
import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
def get_text(url):
page = urlopen(url)
soup = BeautifulSoup(page, "lxml")
text = ' '.join(map(lambda p: p.text, soup.find_all('p')))
return soup.title.text, text
number_of_words = 0
url = input('Enter URL - ')
text = get_text(url)
I want to calculate the word count for this text variable
Taking https://www.ibm.com/in-en/cloud/learn/what-is-artificial-intelligence as the URL, everything works well, except for getting the word count of text variable.
P.S. - The word_count count variable entered as a parameter, and the word count of the summary generated differs.
Also I have managed to get the text character length of original text retrieved from URL using the following code
print('Text character length - ', len(str(text)))
CodePudding user response:
len(str(text))
will count letters not words, to count total words you will have to split the text len(str(text).split())
:
import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
def get_text(url):
page = urlopen(url)
soup = BeautifulSoup(page, "lxml")
text = ' '.join(map(lambda p: p.text, soup.find_all('p')))
return soup.title.text, text
url = input('Enter URL - ')
text = get_text(url)
number_of_words = len(str(text).split())
print(number_of_words)
output:
1080