Currently we have a requirement to use Sqoop to establish an SSL-based connection to extract data from MySQL to HDFS. According to the relevant documents of AWS and Sqoop, we have completed the splicing of the following commands:
sqoop-import \
--connect "jdbc:mysql://remote-db.amazonaws.com.cn:3306/TSTWOWDB?verifyServerCertificate=false&useSSL=true&requireSSL=true&sslMode=VERIFY_IDENTITY&trustCertificateKeyStoreUrl=/home/etl/ivan/ssl/clientkeystore.jks&trustCertificateKeyStorePassword=xxxxxx" \
--username "TEST_USER" --password "xxxxxx" \
--table "t_wrong_qrcodes" \
--target-dir /tmp/ivan/t_wrong_qrcodes \
-m 1
Among the attributes,'/home/etl/ivan/ssl/clientkeystore.jks' is the local file path of the submitted server.
After the task is started, the following error is reported: Caused by: java.io.FileNotFoundException: /mnt/home/etl/ivan/ssl/clientkeystore.jks (No such file or directory)
, which seems to be because Sqoop started Map-Reduce task, but there is no such file path'/mnt/home/etl/ivan/ssl/clientkeystore.jks' on each computing node.
How to set accurate attributes, and whether there is any loss of operations?
Sqoop Version - 1.4.7
References:
- https://docs.amazonaws.cn/en_us/emr/latest/ReleaseGuide/emr-sqoop-considerations.html
- https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html
- Warning about SSL connection when connecting to MySQL database
CodePudding user response:
Till now, We've solved it. Pay attention to the following three points:
- Sync trust store file, named
jks
file, to each computing nodes. If you are using Aws or other cloud services, you should put the trust store file to all nodes, including task nodes if you're using instance fleet. - The attribute
trustCertificateKeyStoreUrl
should be valid url which could start withfile:/
, for instance,trustCertificateKeyStoreUrl=file:/home/hadoop/ssl/clientkeystore.jks
- Set attribute
useSSL
as true, or it won't transform using SSL to encrypt connection.