Home > other > Take more partitions, spark through JDBC RDD serious data skew, everyone to see
Take more partitions, spark through JDBC RDD serious data skew, everyone to see
Time:10-08
Spark version 2.4.5, oracle version 11.2.0.4, spark through JDBC fetch the data from oracle, use oracle rowid way to take over, points and 12 district, the method is such SUBSTR (rowid, 1)), 16, fetch the data there is no problem, take back the data in theory should be generated according to the 16 partitions 16 RDD uniform store? Now my situation is 16 RDD partition is generated, but all the data in the 0 partition, which is in the first partition, 15 other partitions are empty, you everybody to help me see what's going on. There is a problem is to take over should not return an integer? Why into the RDD remainder instead of 9, 9.000000000 code and data are as follows:
Why into the RDD remainder instead of 9 9.000000000 because you are using Oracle () function returns a decimal MOD (ASCII (SUBSTR (ROWID, 1)), 20) you this fog, I why want to die 20?
Val numPartition=16 JdbcDF=spark. Read. The format (" JDBC ") \ Option (" driver "and" oracle. JDBC. OracleDriver ") \ Option (" url ", "JDBC: oracle: thin: @//172.28.88.26:1521/DSHIELD") \ Option (" dbtable, "s" (SELECT the MOD (ASCII (SUBSTR (ROWID, 1)), ${numPartition}) RN, a. * FROM CFA_PERSONBASEINFO_2 A) ") \ Option (" user ", "FISS_NEW") \ Option (" password ", "FISS_NEW") \ Option (" numPartitions, "numPartition) \ Option (" partitionColumn ", "RN") \ Option (" lowerBound ", 0) \ Option (" upperBound, "numPartition) \ Option (" fetchsize ", 100000), The load ()