Home > database >  Chunking - regular expressions and trees
Chunking - regular expressions and trees

Time:09-03

I'm a total noob so sorry if I'm asking something obvious. My question is twofold, or rather it's two questions in the same topic:

  1. I'm studying nltk in Uni, and we're doing chunks. In the grammar I have on my notes the following code:
grammar = r"""
            NP: {<DT|PP\$>?<JJ>*<NN.*> } # noun phrase
            PP: {<IN><NP>}               # prepositional phrase
            VP: {<MD>?<VB.*><NP|PP>}     # verb phrase
            CLAUSE: {<NP><VP>}           # full clause
        """

What is the "$" symbol for in this case? I know it's "end of the line" in regex, but what does it stand for here?

  1. Also, in my text book there's a Tree that's been printed without using the .draw() function, to this result:
Tree('S', [Tree('NP', [('the', 'DT'), ('book', 'NN')]), ('has', 'VBZ'), Tree('NP', [('many', 'JJ'), ('chapters', 'NNS')])])

How the heck one does that???

Thanks in advance to anybody who'll have the patience to school this noob :D

CodePudding user response:

This is the code of your example:

import nltk

sentence = [('the', 'DT'), ('book', 'NN'), ('has', 'VBZ'), ('many', 'JJ'), ('chapters', 'NNS')]

grammar = r"""
            NP: {<DT|PP\$>?<JJ>*<NN.*> } # noun phrase
            PP: {<IN><NP>}               # prepositional phrase
            VP: {<MD>?<VB.*><NP|PP>}     # verb phrase
            CLAUSE: {<NP><VP>}           # full clause
        """

cp = nltk.RegexpParser(grammar) 
result = cp.parse(sentence)

print(result)

#output
#(S(CLAUSE (NP the/DT book/NN) (VP has/VBZ (NP many/JJ chapters/NNS))))


result.draw()

The tree of:

Tree('S', [Tree('NP', [('the', 'DT'), ('book', 'NN')]), ('has', 'VBZ'), Tree('NP', [('many', 'JJ'), ('chapters', 'NNS')])])

enter image description here

I found this link where you can learn a lot.

The $ symbol is a special character in regular expressions, and must be backslash escaped in order to match the tag PP$.

$ Example:

Xyz$  ->  Used to match the pattern xyz at the end of a string
  • Related