Split a text by word length in Python-CodePudding

I have a text as below:-

text = "I have an Apple. I have a Banana. I have an Orange. I have a Watermelon."

I want to split it into a new pandas dataframe for every 5th word as below:-

id	Text
0	I have an Apple. I
1	have a Banana. I have
2	an Orange I have a
3	Watermelon

Any help is much appreciated!

CodePudding user response：

This would work:

text = "I have an Apple. I have a Banana. I have an Orange. I have a Watermelon."
words = text.split()
sentences = []
for i in range(0, len(words), 5):
    sentence = words[i:i 5]
    sentence = ' '.join(sentence)
    sentences.append(sentence)
series = pd.Series(sentences)
df = series.to_frame()
df.columns = ['Text']

Then the resulting dataframe df would look like this, which is what you have specified in your question:

                    Text
0     I have an Apple. I
1  have a Banana. I have
2    an Orange. I have a
3            Watermelon.

CodePudding user response：

You can try groupby then aggretate

import pandas as pd

text = "I have an Apple. I have a Banana. I have an Orange. I have a Watermelon."

df = pd.DataFrame({'Text': text.split()})

out = df.groupby(df.index//5).agg({'Text': ' '.join})

print(out)

                    Text
0     I have an Apple. I
1  have a Banana. I have
2    an Orange. I have a
3            Watermelon.