NoneType erorr when calling .lower() method on annotated text-CodePudding

I have annotated articles in a list (len=488), and I want to apply the .lower() method on the lemmas. I get the following error message AttributeError: 'NoneType' object has no attribute 'lower'. Here's the code:

file = open("Guardian_Syria_text.csv", mode="r", encoding='utf-8-sig')
data = list(csv.reader(file, delimiter=","))
file.close

pickle.dump(data, open('List.p', 'wb'))

stanza.download('en')
nlp = stanza.Pipeline(lang='en',
                      processors='tokenize,lemma,POS',
                      use_gpu=True)

data_list = pickle.load(open('List.p', 'rb'))
new_list = []

for article in data_list:
  a = nlp(str(article))
  new_list.append(a)
pickle.dump(new_list, open('Annotated.p', 'wb'))

annot_data = pickle.load(open('Annotated.p', 'rb'))
pos_tags = {'NOUN', 'VERB', 'ADJ', 'ADV', 'X'}
lemmas = []

for article in annot_data:
  art_tokens = [w.text for s in article.sentences for w in s.words]
  art_lemmas = [w.lemma.lower() for s in article.sentences for w in s.words
                if w.upos in pos_tags]
  lemmas.append(art_lemmas)

I searched the variable annot_data for None (print(annot_data is None)), but it returned False. I tried cleaning the variable like so clean = [x for x in annot_data if x != None], but the length of the variable clean is the same as the old one (488), and the code gives me same error message using the new clean variable instead of the old annot_data one.

Where's the supposed NoneType and how can I avoid it?

CodePudding user response：

The error refers to w.lemma.lower(), so the problem is that w.lemma is None, not that article is None.

You can check for this in the list comprehension.

  art_lemmas = [w.lemma.lower() for s in article.sentences for w in s.words
                if w.lemma is not None and w.upos in pos_tags]