Home > Back-end >  How Can I Generate Similar Content Based On A Text File (60k characters)
How Can I Generate Similar Content Based On A Text File (60k characters)

Time:09-02

I want to generate content, but from a big file instead of a one-liner prompt.

My file contains 60k characters.

Is there an AI library like GPT-3 that can take big files and generate similar content ?

CodePudding user response:

You could try fine-tuning GPT-2 to fit your needs. See this article. Alternatively, if you don't need anything particularly complicated or sophisticated, which doesn't gneerate very coherent sentences, you could use Markov chains with markovify and spacy:

import markovify
import spacy

FILE = "book.txt"

with open(FILE, "r") as file:
    book = file.read()

nlp = spacy.load("en_core_web_sm")
doc = nlp(cleaned)
sentences = " ".join([sent.text for sent in doc.sents if len(sent.text) > 1])
generator = markovify.Text(sentences, state_size=3)

output = ""

for _ in range(100):
    output  = generator.make_sentence()   " "

print(output)

Make sure to run python -m spacy download en_core_web_sm before you run this.

CodePudding user response:

You can have a look to classic text generation notebook like this from tensorflow wich starts from Shakespeare corpus:

https://www.tensorflow.org/text/tutorials/text_generation

Generally, I think that you are looking for training or fine tuning a model on the whole corpus and then generate text starting from one sentence.

  • Related