My Input is "I like to play basketball". And the Output I am looking for is "I like", "like to", "to play", "play basketball". I have used Nltk word tokenize but that gives single tokens only. I have these type of statements in a huge database and this pairwise tokenization is to be run on an entire column.
CodePudding user response:
You can use list comprehension for that:
>>> a = "I like to play basketball"
>>> b = a.split()
>>> c = [" ".join([b[i],b[i 1]]) for i in range(len(b)-1)]
>>> c
['I like', 'like to', 'to play', 'play basketball']
CodePudding user response:
You could do it like this:
s = 'I like to play basketball'
t = s.split()
for i in range(len(t)-1):
print(' '.join(t[i:i 2]))