How to tokenize a string in consecutive pairs using python?-CodePudding

My Input is "I like to play basketball". And the Output I am looking for is "I like", "like to", "to play", "play basketball". I have used Nltk word tokenize but that gives single tokens only. I have these type of statements in a huge database and this pairwise tokenization is to be run on an entire column.

CodePudding user response：

You can use list comprehension for that:

>>> a =  "I like to play basketball"
>>> b = a.split()
>>> c = [" ".join([b[i],b[i 1]]) for i in range(len(b)-1)]
>>> c
['I like', 'like to', 'to play', 'play basketball']

CodePudding user response：

You could do it like this:

s = 'I like to play basketball'
t = s.split()
for i in range(len(t)-1):
    print(' '.join(t[i:i 2]))