Home > front end >  Null Pointer Exception with JdbcIO read in Apache Beam [DataFlow]
Null Pointer Exception with JdbcIO read in Apache Beam [DataFlow]

Time:12-30

I have written code for reading data from Oracle using Apache Beam with Dataflow as a Runner.

I am facing one weird error for a few tables while ingesting data. There are various tables that have 200 Columns with various data types such as Date, Number, and VARCHAR2(***) in Oracle.

Our requirement is to migrate all columns through dataflow to BigQuery. When we select all the columns in Select Query then it gives the Null Pointer exception error mentioned below. So we tried to use selective columns in Query but in that case

  • When the collective datatype size of selected columns is less than ~46752 Bytes then Pipeline runs successfully.
  • and when it crosses this limit then it starts giving a Null Pointer error.

If I explain in more detail, if there is a limit of 2000 Bytes (assumption, the actual value we are getting is around 46752 Bytes) then we will be able to select only two columns with VARCHAR2(1000) datatype or 4 columns with VARCHAR2(500) datatype and so on.

Note - The threshold limit ~46752 Bytes we have calculated by adding columns one by one in Query and executing the code.

We are not sure if is there any such limit attached to the Java JDBC connector of Apache Beam or not but we are facing challenges while migration when our selected columns cross this limit.

Please help me if I am missing any point here or any parameter while reading data through JdbcIO.

Below is the code snippet where the Code is giving an error. It is the entry point of our pipeline. The code is giving errors while reading only. In the below code, I have not mentioned the write operation to BigQuery since it is not getting executed because of failure at JdbcIO.read() only. (cross-check it by commenting out the logic of BigQuery as well).

Read Data from Oracle Code

// Read from JDBC
Pipeline p2 = Pipeline.create(
                PipelineOptionsFactory.fromArgs(args).withValidation().create());

String query2 = "SELECT Col1...Col00 FROM table WHERE rownum<=1000";
PCollection<TableRow> rows = p2.apply(JdbcIO.<TableRow>read()
                .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create(
                                "oracle.jdbc.OracleDriver", "jdbc:oracle:thin:@//localhost:1521/orcl")
                        .withUsername("root")
                        .withPassword("password"))
                .withQuery(query2)
                .withRowMapper(new JdbcIO.RowMapper<TableRow>() {
                    @Override
                    public TableRow mapRow(ResultSet resultSet) throws Exception {

                        schema = getSchemaFromResultSet(resultSet);
                        TableRow tableRow = new TableRow();
                        List<TableFieldSchema> columnNames = schema.getFields();    
                        }
                        return tableRow;
                    }
                )
        );

p2.run().waitUntilFinish();

Error (only when it crosses above the limit of Bytes in columns)

 Error message from worker: java.lang.NullPointerException
oracle.sql.converter.CharacterConverter1Byte.toUnicodeChars(CharacterConverter1Byte.java:344) 
oracle.sql.CharacterSet1Byte.toCharWithReplacement(CharacterSet1Byte.java:134)   
oracle.jdbc.driver.DBConversion._CHARBytesToJavaChars(DBConversion.java:964) 
oracle.jdbc.driver.DBConversion.CHARBytesToJavaChars(DBConversion.java:867)  
oracle.jdbc.driver.T4CVarcharAccessor.unmarshalOneRow(T4CVarcharAccessor.java:298)  
oracle.jdbc.driver.T4CTTIrxd.unmarshal(T4CTTIrxd.java:934)     
oracle.jdbc.driver.T4CTTIrxd.unmarshal(T4CTTIrxd.java:853)     
oracle.jdbc.driver.T4C8Oall.readRXD(T4C8Oall.java:699)     
oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:337)     
oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191)     
oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:523)     
oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)     
oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:863)     
oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1153)     
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1275)     
oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3576)     
oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3620)     
oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java
:1491)     
org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java
:122)     
org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java
:122)     org.apache.beam.sdk.io.jdbc.JdbcIO$ReadFn.processElement(JdbcIO.java:1381)

More Details

  • Oracle Version - 11g
  • Apache Beam SDK - Java
  • There are a few columns that we are selecting has Null values also but that should not cause any issues.
  • There is no issue with specific columns since I have tried all the possible combinations of the columns.
  • There is no number of columns select limit since I am able to read data from an Oracle table where the number of columns is more than 300 but the total byte size is less than 46752 Bytes.

CodePudding user response:

The JdbcIO source transform has a property called FetchSize that defines how much data can be fetched from the database. The default value is 50000 bytes. We can change the runtime value using the withFetchSize(int fetchSize) method. Docs.

  • Related