Home > Back-end >  Databricks upload data in batch
Databricks upload data in batch

Time:11-08

We all know that via the Databricks Data / Table UI we can upload data using Create a table using file upload for small files.

Does Databricks have a standard batch approach for large files? Or do we need to use sFTP, hadoop distcp, some sort of REST Service? I want to make sure I have not missed some new development.

CodePudding user response:

Usual recommendation is to use cloud-specific tools for copying/moving data to the cloud - azcopy, Azure Data Factory, aws-cli, etc. These tools are usually heavily optimized for high throughput & parallelism.

Upload to DBFS should be really limited to small files when people want to do data exploration (technically you can upload files to DBFS, for example by using Databricks CLI, but it's not heavily optimized for huge amounts of data).

  • Related