Val sc=new SparkContext (new SparkConf () setMaster (" local "). SetAppName (" testApp "))
Val sqlContext=new sqlContext (sc)
Implicit val region=region. CN_NORTH_1
Val tempS3Dir="s3a://redshift - test/redshift/red/"
//set the S3 link information
SqlContext. SparkContext. HadoopConfiguration. Set (fs) s3a) access. The key, "AKIA3ZwewewewCHYE");
SqlContext. SparkContext. HadoopConfiguration. Set (fs. S3a. Secret. "key", "wg2mPMDNtcqeweweweweCSu7Q + JJHNPT2O");
SqlContext. SparkContext. HadoopConfiguration. Set (" fs. S3a. The endpoint ", "s3.cn-north-1.amazonaws.com.cn");
SqlContext. SetConf (" driver "and" com. Amazon. Redshift. Jdbc4. Driver ")
Val dataDF=sqlContext. Read
. The format (" CSV ")
Option (" header ", true)
. The load (s3a://"redshift - test/redshift/out/test0. CSV")
//read the table data
Val test_union=sqlContext. Read
. The format (" JDBC ")
Option (" url "jdbcURL)
Option (" dbtable ", "test_union")
The load ()
//dataDF but test_union no data
Val data=https://bbs.csdn.net/topics/dataDF.except (test_union)
The data show ()
Data. The write
Mode (SaveMode. Overwrite)//Overwrite said reload
Option (" header ", true)
JDBC (jdbcURL "test_test", new Properties)
Sc. Stop ()
Execution is slow, has been turned, why? Val data=https://bbs.csdn.net/topics/dataDF.except (test_union) should be the problem, but I don't know how to do
CodePudding user response:
Although this operation without you, but I think the except operation than Descartes product without operations are also circulating operation for many times, so people think not much,