Home > Software design >  Ngram in python with start_pad
Ngram in python with start_pad

Time:04-14

i'm know in python i'm take some basic thing about list and tuple but my not full understand the my cod i want create list have three index in each index have tuple with tow index like this [('~','a'),('a','b'),('b','c')] the first index in tuple have tow char or the length context when have like this [('~a','a'),('ab','b'),('bc',' c')] can any one help my ? Her my code


def getNGrams(wordlist, n):
ngrams = []
padded_tokens = "~"*(n)   wordlist
t = tuple(wordlist)
for i in range(3):
  t = tuple(padded_tokens[i:i n])
  ngrams.append(t)
return ngrams

CodePudding user response:

IIUC, You can change the function like below and get what you want:

def getNGrams(wordlist, n):
    ngrams = []
    padded_tokens = "~"*n   wordlist
    for idx, i in enumerate(range(len(wordlist))):
        t = tuple((padded_tokens[i:i n], wordlist[idx]))
        ngrams.append(t)
    return ngrams

print(getNGrams('abc',1))
print(getNGrams('abc',2))
print(getNGrams('abc',3))

Output:

[('~', 'a'), ('a', 'b'), ('b', 'c')]
[('~~', 'a'), ('~a', 'b'), ('ab', 'c')]
[('~~~', 'a'), ('~~a', 'b'), ('~ab', 'c')]
  • Related