I'm working on the Deployment of the Purview ADB Lineage Solution Accelerator. In step 3 of Install OpenLineage on Your Databricks Cluster section, the author is asking to run the following in thepowershell
to Upload the init
script and jar
to dbfs using the Databricks CLI.
dbfs mkdirs dbfs:/databricks/openlineage
dbfs cp --overwrite ./openlineage-spark-*.jar dbfs:/databricks/openlineage/
dbfs cp --overwrite ./open-lineage-init-script.sh dbfs:/databricks/openlineage/open-lineage-init-script.sh
Question: Do I correctly understand the above code as follows? If that is not the case, before running the code, I would like to know what exactly the code is doing.
- The first line creates a folder
openlineage
in the root directory ofdbfs
- It's assumed that you are running the
powershell
command from the location where.jar
andopen-lineage-init-script.sh
are located - The second and third lines of the code are copying the
jar
and.sh
files from your local directory to thedbfs:/databricks/openlineage/
indbfs
of Databricks
CodePudding user response:
dbfs mkdirs
is an equivalent of UNIXmkdir -p
, ie. under DBFS root it will create a folder nameddatabricks
, and inside it another folder namedopenlineage
- and will not complain if these directories already exist.and 3. Yes. Files/directories not prefixed with
dbfs:/
mean your local filesystem. Note that you can copy from DBFS to local or vice versa, or between two DBFS locations. Just not between local filesystem only.