My environment is spark2.1 + hdp2.6 using spark on yarn model, used in pyspark, python3.5
Results in the perform similar distinct statement
User_data=https://bbs.csdn.net/topics/sc.textFile (" testdata/u.u ser ")
User_fields=user_data. The map (lambda line: line. The split (" | "))
Num_genders=user_fields. The map (lambda fields: fields [2]). The distinct (). The count ()
Exceptions, the following
File "/data/opt/hadoop - server/TMP/nm - local - dir/usercache/jsdxadm/appcache/application_1494985561557_0001 container_1494985561557_0001_01_000002/pyspark. Zip/pyspark RDD. Py", line 72, in portable_hash
Raise the Exception (" Randomness of hash of a string should be disabled via PYTHONHASHSEED ")
Exception: the Randomness of the hash of a string should be disabled via PYTHONHASHSEED
According to the source code as if the increased security holes, the control of python3
If sys. Version & gt;='3.3' and 'PYTHONHASHSEED' not in OS. Environ:
Raise the Exception (" Randomness of hash of a string should be disabled via PYTHONHASHSEED ")
I according to the online way, using the two methods, all not line, which met, could you tell how to solve?
1, echo "export PYTHONHASHSEED=0" & gt;>/root/bashrc
2, spark. Yarn. AppMasterEnv. PYTHONHASHSEED="XXXX"