I am looking to use a FastText model in a ML pipeline that I made and saved as a .bin
file on s3. My hope is to keep this all in a cloud based pipeline, so I don't want local files. I feel like I am really close, but I can't figure out how to make a temporary .bin
file. I also am not sure if I am saving and reading the FastText model in the most efficient way. The below code works, but it saves the file locally which I want to avoid.
import smart_open
file = smart_open.smart_open(s3 location of .bin model)
listed = b''.join([i for i in file])
with open("ml_model.bin", "wb") as binary_file:
binary_file.write(listed)
model = fasttext.load_model("ml_model.bin")
CodePudding user response:
If you want to use the fasttext
wrapper for the official Facebook FastText code, you may need to create a local temporary copy - your troubles make it seem like that code relies on opening a local file path.
You could also try the Gensim
package's separate FastText
support, which should accept an S3 path via its load_facebook_model()
function:
https://radimrehurek.com/gensim/models/fasttext.html#gensim.models.fasttext.load_facebook_model
(Note, though, that Gensim doesn't support all FastText functionality, like the supervised
mode.)
CodePudding user response:
As partially answered by the above response, a temporary file was needed. But on top of that, the temporary file needed to be passed as a string object, which is sort of strange. Working code below:
import tempfile
import fasttext
import smart_open
from pathlib import Path
file = smart_open.smart_open(f's3://{bucket_name}/{key}')
listed = b''.join([i for i in file])
with tempfile.TemporaryDirectory() as tdir:
tfile = Path(tdir).joinpath('tempfile.bin')
tfile.write_bytes(listed)
model = fasttext.load_model(str(tfile))