How to train a ML model converting text to code-CodePudding

Looking for a working example Colab/Notebook showing training or fine-tuning of a text generation model capable of converting "short text" -> "programming code text".

I'm learning the topic and would like to fine-tune it with a custom metric on some public GitHub repos.

All I found so far are models that "continue a sentence" or simply generate the text out of the blue. Many thanks!

CodePudding user response：

First, You can see CodeXGLUE and their repository, we have four categories:

code-code (clone detection, defect detection, cloze test, code completion, code repair, and code-to-code translation)
text-code (natural language code search, text-to-code generation)
code-text (code summarization)
text-text (documentation translation)

You want text-to-code generation task. Base benchmark on CodeXGLUE, one of the best models for this task is CoTexT. CoTexT support these programming languages : "go" ,"java", "javascript", "php", "python", "ruby". You can find the pre-trained of this model on huggingface from here and explaining about how to fine-tune this here.