How can I automatically make my code paraphrase the outputted paragraph in such a way that there is-CodePudding

My code obtains certain paragraphs from a textbook. I would like the code to spit out several versions of this paragraph using some sort of paraphrasing tool, automatically. The code is written mostly in Python, but also uses Js (and Ts) for certain aspects.

I tried implementing several completed models, including; (python) Pytorch, parrot, scpn, and others, but it quickly became too complicated for me. Can someone help me write a better way, or help me use an already made model? Thank you <3

CodePudding user response：

Paraphrasing isn't an easy task. You'd need to check what can be removed from a paragraph and still make sense. You'd need an algorithm which understands English to be able to remove what is unnecessary.

Considering you mentioned that you're on python, you could always try parrot

CodePudding user response：

You might wanna look at https://huggingface.co/tuner007/pegasus_paraphrase

Huggingface has a wide collection of ML models and associated libraries ("transformers" python library in this case) that automatically downloads the model from their site and runs it a very a streamlined way.

pegasus_paraphrase is one of those models. You can also read the code in the model card on the link above.

import torch
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
model_name = 'tuner007/pegasus_paraphrase'
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

def get_response(input_text,num_return_sequences,num_beams):
  batch = tokenizer([input_text],truncation=True,padding='longest',max_length=60, return_tensors="pt").to(torch_device)
  translated = model.generate(**batch,max_length=60,num_beams=num_beams, num_return_sequences=num_return_sequences, temperature=1.5)
  tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
  return tgt_text

Sample results

num_beams = 10
num_return_sequences = 10
context = "The ultimate test of your knowledge is your capacity to convey it to another."
get_response(context,num_return_sequences,num_beams)
# output:
['The test of your knowledge is your ability to convey it.',
 'The ability to convey your knowledge is the ultimate test of your knowledge.',
 'The ability to convey your knowledge is the most important test of your knowledge.',
 'Your capacity to convey your knowledge is the ultimate test of it.',
 'The test of your knowledge is your ability to communicate it.',
 'Your capacity to convey your knowledge is the ultimate test of your knowledge.',
 'Your capacity to convey your knowledge to another is the ultimate test of your knowledge.',
 'Your capacity to convey your knowledge is the most important test of your knowledge.',
 'The test of your knowledge is how well you can convey it.',
 'Your capacity to convey your knowledge is the ultimate test.']