I try to count the number of words in different websites, however I get the "TypeError: 'str' object does not support item assignment"
. Here is my code:
import requests
from bs4 import BeautifulSoup as BS
URL = "https://www.coach.com/shop/women/handbags/view-all"
headers = {'User-Agent': 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)'}
page = requests.get(URL, headers = headers)
html_content= page.text
soup = BS(html_content, "lxml")
content = {}
try:
text_counter = 0
x = soup.find_all("h2")
for y in x:
title_length = len(y.get_text().split())
text_counter = title_length
content = y.findNext('p').get_text()
content_length = len(y.findNext('p').get_text().split())
text_counter = content_length
t = soup.find_all("h3")
for q in t:
title_length = len(q.get_text().split())
text_counter = title_length
content = q.findNext('p').get_text()
content_length = len(q.findNext('p').get_text().split())
text_counter = content_length
content["n_words"] = text_counter
except:
content["n_words"] = ""
Full trace:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-34168cb72272> in <module>
26 text_counter = content_length
---> 27 content["n_words"] = text_counter
28 except:
TypeError: 'str' object does not support item assignment
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-2-34168cb72272> in <module>
27 content["n_words"] = text_counter
28 except:
---> 29 content["n_words"] = ""
TypeError: 'str' object does not support item assignment
CodePudding user response:
You just have two variables with the same name:
content = {}
content = q.findNext('p')
Just change the name of, ie, the global dictionary into smt else, dcontent
or word_counter
,...
dcontent = {} # <-- d stand for dictionary
try:
text_counter = 0
t = soup.find_all("h2")
# ... same
t = soup.find_all("h3")
for q in t:
title_length = len(q.get_text().split())
text_counter = title_length
content = q.findNext('p') # <-- here the content from the soup
if content.get_text() != '':
content_length = len(content.split())
text_counter = content_length
dcontent["n_words"] = text_counter # <-- here update the dictionary
except Exception as e:
print(e)
dcontent["n_words"] = ""
print(dcontent)
#{'n_words': 52}
Remark:
- use
tag.get_text() != ''
to check if the tag contains a string and nottag.string is not None
as I said in a comment - apply such filter always in such situations, which means also for the
h2
-case