I have a text as below:-
text = "I have an Apple. I have a Banana. I have an Orange. I have a Watermelon."
I want to split it into a new pandas dataframe for every 5th word as below:-
id | Text |
---|---|
0 | I have an Apple. I |
1 | have a Banana. I have |
2 | an Orange I have a |
3 | Watermelon |
Any help is much appreciated!
CodePudding user response:
This would work:
text = "I have an Apple. I have a Banana. I have an Orange. I have a Watermelon."
words = text.split()
sentences = []
for i in range(0, len(words), 5):
sentence = words[i:i 5]
sentence = ' '.join(sentence)
sentences.append(sentence)
series = pd.Series(sentences)
df = series.to_frame()
df.columns = ['Text']
Then the resulting dataframe df
would look like this, which is what you have specified in your question:
Text
0 I have an Apple. I
1 have a Banana. I have
2 an Orange. I have a
3 Watermelon.
CodePudding user response:
You can try groupby then aggretate
import pandas as pd
text = "I have an Apple. I have a Banana. I have an Orange. I have a Watermelon."
df = pd.DataFrame({'Text': text.split()})
out = df.groupby(df.index//5).agg({'Text': ' '.join})
print(out)
Text
0 I have an Apple. I
1 have a Banana. I have
2 an Orange. I have a
3 Watermelon.