The SPARK dataframe localization for help-CodePudding

The result of the thinking of the spark SQL dataframe saved as CSV to the local file system, running on the cluster, however, will only in the master of a given directory to generate the machine _SUCESS file, and the true CSV files as if will be randomly generated above the other machines of the cluster, isn't there any way to generate the position of the specified file? At least you can determine the CSV file will be generated on which machine?

CodePudding user response:

HDFS is like this, if you can specify to a machine, that is not a distributed environment, single mode can be saved to the local machine

CodePudding user response:

Stand-alone mode is the local directory to generate the current machines, cluster pattern, you don't specify a machine that specified directly HDFS://directory, this is equivalent to the distributed file system, all the machines Shared file directory, a cluster is yarn for resource management, distribution of random selection for the driver side, if the setting is a local directory, may be generated on the catalogue of the machine to a,