I'm able to save a Great_Expectations suite to the tmp folder on my Databricks Community Edition as follows:
ge_partdf.save_expectation_suite('/tmp/myexpectation_suite.json',discard_failed_expectations=False)
But the problem is, when I restart the cluster the json file in longer in the tmp folder. The reason for this I guess is because files that reside in the tmp folder are temporary. However if I try and save it a folder that I know exists on Databricks e.g /FileStore/tables I get the error message:
FileNotFoundError: [Errno 2] No such file or directory: '/FileStore/tables/myexpectation_suite.json'
Can someone let me know how to save locally on Databricks please.
CodePudding user response:
The save_expectation_suite
function uses the local Python API and storing the data on the local disk, not on DBFS - that's why file disappeared.
If you use full Databricks (on AWS or Azure), then you just need to prepend /dbfs
to your path, and file will be stored on the DBFS via so-called DBFS fuse (see docs).
On Community edition you will need to to continue to use to local disk and then use dbutils.fs.cp
to copy file from local disk to DBFS