Home > Software design >  Frequency Distribution of Bigrams
Frequency Distribution of Bigrams

Time:12-14

I have done the following

import nltk


words = nltk.corpus.brown.words()
freq = nltk.FreqDist(words)

And am able to find the frequency of certain words in the brown corpus, like

freq["the"]
62713

But now I want to be able to find the Frequency Distribution of specific bigrams. So then I tried

bigrams = nltk.bigrams(words)
freqbig = nltk.FreqDist(bigrams)

But every bigram that I enter, I always get 0. Like,

freqbig["the man"]
0

What I am doing wrong?

CodePudding user response:

It accepts a tuple as key, not a str:

freqbig[("the", "man")]

OUTPUT

128

If you want to pass strings, you could create an auxiliary function which takes care of it:

def get_frequency(my_string):
    return freqbig[tuple(my_string.split(" "))]
  • Related