Home > other >  SQOOP NOLOGGING export task to an Oracle DB
SQOOP NOLOGGING export task to an Oracle DB

Time:02-05

I'm totally stumped as I'm trying to include NOLOGGING into my SQOOP export task to an Oracle Database from HIVE.

The SQOOP user guide -https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_nologging says to use :- Doraoop.nologging=true

I think I've added the below into my code correctly but it does not seem to work.

    <property>
    <name>Doraoop.nologging</name>
    <value>true</value>
    </property>

The script below runs but I'm not seeing any performance gains which makes me think it is not working.

<!-- Sqoop export of data from HDFS to OR Datalab -->    
<action name="SQOOP_EXPORT" retry-max="2" retry-interval="5">
    <sqoop xmlns="uri:oozie:sqoop-action:0.4">
    <job-tracker>${jobTracker}</job-tracker>
    <name-node>${nameNode}</name-node>
    <configuration>
    <property>
    <name>mapreduce.job.queuename</name>
    <value>${yarn_queueName}</value>
    </property>
    <property>
    <name>org.apache.sqoop.export.text.dump_data_on_error</name>
    <value>true</value>
    </property>
    <property>
    <name>hadoop.security.credential.provider.path</name>
    <value>jceks://hdfs/user/lib/keystore.pswd</value>
    </property>
    <property>
    <name>Doraoop.nologging</name>
    <value>true</value>
    </property>
    <property>
    <name>sqoop.export.records.per.statement</name>
    <value>100000</value>
    </property>
    <property>
    <name>sqoop.export.statements.per.transaction</name>
    <value>10</value>
    </property>
    </configuration>
    <arg>export</arg>
    <arg>--connect</arg>
    <arg>jdbc:oracle:thin:@*****test****:12345/DATALAND</arg>
    <arg>--username</arg>
    <arg>LANDING</arg>
    <arg>--password-alias</arg>
    <arg>pswd.ordl</arg>
    <arg>--export-dir</arg>
    <arg>${sqoopHDFSDataDir}</arg>
    <arg>--table</arg>
    <arg>${sqoopDataTable}</arg>
    <arg>--columns</arg>
    <arg>${sqoopDataColumns}</arg>
    <arg>--input-fields-terminated-by</arg>
    <arg>\001</arg>
    <arg>--input-lines-terminated-by</arg>
    <arg>\n</arg>
    <arg>--input-null-string</arg>
    <arg>\\N</arg>
    <arg>--input-null-non-string</arg>
    <arg>\\N</arg>
    <arg>-m</arg>
    <arg>${sqoopNumMappers}</arg>
</sqoop>
<ok to="HIVE2_LOG_SCRIPT"/>
<error to="Email_failure"/>
</action>

CodePudding user response:

Nologging in Oracle doesn't always mean "no logging". It only applies to certain direct write imports in specific situations, and doesn't necessarily mean that no transaction logs are recorded even then. The nologging option can also be overridden by database or tablespace settings that force logging to always occur (likely the case in any production database).

Also beware that nologging operations affect your ability to recover from backups: always perform backups before and immediately after any nologging operation, and do not allow other logging transactions to take place while nologging operations are in progress.

From the documentation:

https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/logging_clause.html#GUID-C4212274-5595-4045-A599-F033772C496E

"If you specify NOLOGGING, then the creation of a database object, as well as subsequent conventional inserts, will be logged in the redo log file. Direct-path inserts will not be logged...

If the object for which you are specifying the logging attributes resides in a database or tablespace in force logging mode, then Oracle Database ignores any NOLOGGING setting until the database or tablespace is taken out of force logging mode...

NOLOGGING is supported in only a subset of the locations that support LOGGING. Only the following operations support the NOLOGGING mode:

DML:

  • Direct-path INSERT (serial or parallel) resulting either from an INSERT or a MERGE statement. NOLOGGING is not applicable to any
    UPDATE operations resulting from the MERGE statement.

  • Direct Loader (SQL*Loader)

DDL:

  • CREATE TABLE ... AS SELECT (In NOLOGGING mode, the creation of the table will be logged, but direct-path inserts will not be logged.)

  • CREATE TABLE ... LOB_storage_clause ... LOB_parameters ... CACHE | NOCACHE | CACHE READS

  • ALTER TABLE ... LOB_storage_clause ... LOB_parameters ... CACHE | NOCACHE | CACHE READS (to specify logging of newly created LOB columns)

  • ALTER TABLE ... modify_LOB_storage_clause ... modify_LOB_parameters ... CACHE | NOCACHE | CACHE READS (to change logging of existing LOB columns)

  • ALTER TABLE ... MOVE

  • ALTER TABLE ... (all partition operations that involve data movement)

  • ALTER TABLE ... ADD PARTITION (hash partition only)

  • ALTER TABLE ... MERGE PARTITIONS

  • ALTER TABLE ... SPLIT PARTITION

  • ALTER TABLE ... MOVE PARTITION

  • ALTER TABLE ... MODIFY PARTITION ... ADD SUBPARTITION

  • ALTER TABLE ... MODIFY PARTITION ... COALESCE SUBPARTITION

  • CREATE INDEX

  • ALTER INDEX ... REBUILD

  • ALTER INDEX ... REBUILD [SUB]PARTITION

  • ALTER INDEX ... SPLIT PARTITION"

Also, see here: http://www.dba-oracle.com/t_nologging_append.htm

CodePudding user response:

First of all pls ensure you have oraoop connector. Hive version > 1.4.6 has it. Second, these -D commands are sensitive to position of the command. Can you pls run a simple sqooop statement with all your parameters and see what issues you are facing. Ideal sqoop statement with all -D options will look like below

sqoop import -Dsqoop.export.records.per.statement=10000 -Dsqoop.export.statements.per.transaction=100 -Doraoop.nologging=true <rest of commands>
  •  Tags:  
  • Related