I'm a total noob so sorry if I'm asking something obvious. My question is twofold, or rather it's two questions in the same topic:
- I'm studying nltk in Uni, and we're doing chunks. In the grammar I have on my notes the following code:
grammar = r"""
NP: {<DT|PP\$>?<JJ>*<NN.*> } # noun phrase
PP: {<IN><NP>} # prepositional phrase
VP: {<MD>?<VB.*><NP|PP>} # verb phrase
CLAUSE: {<NP><VP>} # full clause
"""
What is the "$" symbol for in this case? I know it's "end of the line" in regex, but what does it stand for here?
- Also, in my text book there's a Tree that's been printed without using the .draw() function, to this result:
Tree('S', [Tree('NP', [('the', 'DT'), ('book', 'NN')]), ('has', 'VBZ'), Tree('NP', [('many', 'JJ'), ('chapters', 'NNS')])])
How the heck one does that???
Thanks in advance to anybody who'll have the patience to school this noob :D
CodePudding user response:
This is the code of your example:
import nltk
sentence = [('the', 'DT'), ('book', 'NN'), ('has', 'VBZ'), ('many', 'JJ'), ('chapters', 'NNS')]
grammar = r"""
NP: {<DT|PP\$>?<JJ>*<NN.*> } # noun phrase
PP: {<IN><NP>} # prepositional phrase
VP: {<MD>?<VB.*><NP|PP>} # verb phrase
CLAUSE: {<NP><VP>} # full clause
"""
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentence)
print(result)
#output
#(S(CLAUSE (NP the/DT book/NN) (VP has/VBZ (NP many/JJ chapters/NNS))))
result.draw()
The tree of:
Tree('S', [Tree('NP', [('the', 'DT'), ('book', 'NN')]), ('has', 'VBZ'), Tree('NP', [('many', 'JJ'), ('chapters', 'NNS')])])
I found this link where you can learn a lot.
The $ symbol is a special character in regular expressions, and must be backslash escaped in order to match the tag PP$.
$ Example:
Xyz$ -> Used to match the pattern xyz at the end of a string