textEmbed error about sentencepiece for Deberta-CodePudding

I get error when running deberta in the R-package text, when running:

textEmbed(“hello”, model = “microsoft/deberta-v3-base”)

error:

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  ValueError: This tokenizer cannot be instantiated. Please make sure you have `sentencepiece` installed in order to use this tokenizer.

CodePudding user response：

So to get this to work you need to install sentencepiece in your conda environment. (And when I did that I had some problems that RStudio was freezing for me – so after updating RStudio and R, I created a specific conda environment with scipy 1.6 and sentencepiece, and then it works without any problems:

text::textrpp_install(rpp_version=c("torch==1.8", "transformers==4.12.5",
                                    "numpy", "nltk",
                                    "scipy==1.6", "sentencepiece"),
                      envname = "textrpp_condaenv_sentencepiece")

text::textrpp_initialize(condaenv = "textrpp_condaenv_sentencepiece",
                         refresh_settings = TRUE)