Home > Back-end >  BigQuery jobs failing due to too many columns
BigQuery jobs failing due to too many columns

Time:06-07

I am trying to push data using the Salesforce API to BigQuery using Google Scripts. For this example, I am trying to push Opportunities to BigQuery.

I have this variable called opps, and it is prepared by calling the Salesforce API and then retrieving only the data that I care about. This looks like this:

 var opps = [];
 for (var i in arrOpportunities.records) {
      let data = arrOpportunities.records[i];
      let createDate = Utilities.formatDate(new Date(data.CreatedDate), "GMT", "dd-MM-YYYY");
      let modDate = Utilities.formatDate(new Date(data.LastModifiedDate), "GMT", "dd-MM-YYYY");

      let a1 = 'C'   (parseInt(i, 10)   2);
      let companyFormula = '=IFERROR(INDEX(Accounts,MATCH('   a1   ',Accounts!$B2:$B,0),1),"")';
      opps.push([data.Name, data.Id, data.AccountId, companyFormula, data.StageName, data.IsClosed, data.IsWon, createDate, modDate, data.ContactId, data['Region_by_manager__c'], data['Industry__c'], data['Customer_type__c'], data['acc__c'], data['Reason_for_lost_deal__c'], data['hs_deal_id__c'], data['Solution__c'], data['Solution_Elements__c']]);
    }

Afterwards I check to see if there is already a table in BigQuery using guideInsertData('Opportunities', 'Opportunities', opps); which comes after the for loop above. It doesn't actually check itself, it has another function for that, which is irrelevant for this discussion. This function just guides the process.

The function here looks like this:

function guideInsertData(tableName, headers, data) {
  const tableExists = checkTables(tableName);
 
    if (tableExists == false) {
      let createTable = prepareSchema(headers, tableName);
      if (createTable == false) {
        throw 'Unable to create table';
      }

    }
   insertData(tableName,data); 
}

and then finally we get to the actual function for inserting the data which is here:

function insertData(tableName, arrData) {
  const projectId = 'lateral-scion-352013', datasetId = 'CRM_Data', tableId = tableName;
  const job = {
    configuration: {
      load: {
        destinationTable: {
          projectId: projectId,
          datasetId: datasetId,
          tableId: tableId
        },
        skipLeadingRows: 1
      }
    }
  };

  const strData = arrData.join("\n");
  const data = Utilities.newBlob(strData,"application/octet-stream");
  Logger.log(strData);
  Logger.log(data.getDataAsString());

  try {
    BigQuery.Jobs.insert(job, projectId, data);
    let success = 'Load job started. Check on the status of it here: '  
        'https://console.cloud.google.com/home/activity?project=' projectId;
      Logger.log(success);
      return success;
  } catch (err) {
    Logger.log(err);
    Logger.log('unable to insert job');
    return 'unable to insert data';
  }
}

This is based on https://developers.google.com/apps-script/advanced/bigquery#load_csv_data.

I convert everything into the blob as requested. I can't see anything wrong in the Log. Not only that, I get the message that the jobs have loaded successfully.

However, when I check the status of the activity in BigQuery, I get Failed:Complete BigQuery job, these are the error messages:

  • Invalid argument (HTTP 400): Error while reading data, error message: Too many values in row starting at position: 234. Found 23 column(s) while expected 18.
  • Error while reading data, error message: CSV processing encountered too many errors, giving up. Rows: 0; errors: 1; max bad: 0; error percent: 0
  • You are loading data without specifying data format, data will be treated as CSV format by default. If this is not what you mean, please specify data format by --source_format.

How do I fix this error?

CodePudding user response:

Ok, so it looks like there was an issue in how it was stringified. I had to stringify first each nested array and then stringify the whole thing.

So in the end, the code looks like this:

  const strData = arrData.map(values => values.map(value => JSON.stringify(value).replace(/\\"/g, '""')));
  const csvData = strData.map(values => values.join(',')).join('\n');
  const data = Utilities.newBlob(csvData, "application/octet-stream");

Also, all date values need to be in yyyy-MM-dd format not dd-MM-yyy

  • Related