Home > Software design >  Saving scrapped content into a dictionary in Python
Saving scrapped content into a dictionary in Python

Time:06-06

I try to count the number of words in different websites, however I get the "TypeError: 'str' object does not support item assignment". Here is my code:

import requests
from bs4 import BeautifulSoup as BS
URL = "https://www.coach.com/shop/women/handbags/view-all"
headers = {'User-Agent': 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)'}
page = requests.get(URL, headers = headers)
html_content= page.text
soup = BS(html_content, "lxml")
content = {}
try:
    text_counter = 0
    x = soup.find_all("h2")
    for y in x:
        title_length = len(y.get_text().split())
        text_counter  = title_length
        content = y.findNext('p').get_text()
        content_length = len(y.findNext('p').get_text().split())
        text_counter  = content_length

    t = soup.find_all("h3")
    for q in t:
        title_length = len(q.get_text().split())
        text_counter  = title_length
        content = q.findNext('p').get_text()
        content_length = len(q.findNext('p').get_text().split())
        text_counter  = content_length
    content["n_words"] = text_counter 
except:
    content["n_words"] = ""

Full trace:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-34168cb72272> in <module>
     26         text_counter  = content_length
---> 27     content["n_words"] = text_counter
     28 except:

TypeError: 'str' object does not support item assignment

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-2-34168cb72272> in <module>
     27     content["n_words"] = text_counter
     28 except:
---> 29     content["n_words"] = ""

TypeError: 'str' object does not support item assignment

CodePudding user response:

You just have two variables with the same name:

  • content = {}
  • content = q.findNext('p')

Just change the name of, ie, the global dictionary into smt else, dcontent or word_counter,...

dcontent = {} # <-- d stand for dictionary
try:
    text_counter = 0
    
    t = soup.find_all("h2")    
    # ... same

    t = soup.find_all("h3")
    for q in t:
        title_length = len(q.get_text().split())
        text_counter  = title_length
        content = q.findNext('p') # <-- here the content from the soup
        if content.get_text() != '':
            content_length = len(content.split())
            text_counter  = content_length
            dcontent["n_words"] = text_counter # <-- here update the dictionary
except Exception as e:
    print(e)
    dcontent["n_words"] = ""

print(dcontent)
#{'n_words': 52}

Remark:

  • use tag.get_text() != '' to check if the tag contains a string and not tag.string is not None as I said in a comment
  • apply such filter always in such situations, which means also for the h2-case
  • Related