I was asked to pull data from Hadoop (Impala or Hive) and insert it into Teradata. I tried to pull the data as csv and insert it into Teradata using the python script. However, every time I tried to download the csv, it failed (it says Network error, so might be an issue with my internet). Is there any way I can make this task simpler and easier? I have zero knowledge of Hadoop, so please help with a detailed explanation. Thank you very much!
CodePudding user response:
Yes, you can use sqoop export
. You need to setup jdbc driver first and then you are good to go. Hopefully your admin can help here.
sqoop export --connect jdbc:teradata://server-name:server-port/database-server-name --username uname --password pwd --table Teradata_table --hcatalog-database db_name --hcatalog-table sample_table -m 18
Now, this is a MVP solution, you need to check if this is working for all tables and if you need to implement any password aliasing or apply any security.
EDIT/Update : Regarding @Fred's comment on TDCH. This is a free tool ( to be installed in Hadoop cluster) that can move data between hive and Teradata better than sqoop. Please note, if you have complex, high volume tables to move, this can be an efficient option.
Thank you @Fred for mentioning this.