Home > Mobile >  Use Google Apps Script to import multiple .CSV files from Google Drive into BigQuery
Use Google Apps Script to import multiple .CSV files from Google Drive into BigQuery

Time:02-11

I've downloaded a year's worth of .CSV files from my portal with American Express. I want to use Google Apps script to take this data and put it into Google BigQuery. Can't really do this interactively because the BigQuery UX doesn't provide a mechanism to appending data to a table. So, I have written a script to do it:

      var firstFileData = firstFile.getBlob().setContentType('application/octet-stream')
      var firstFileDataString = firstFileData.getDataAsString()
      var firstFileDataCSV = Utilities.parseCsv(firstFileDataString)

The variable firstFileDataString ends up with the contents of the CSV, but what I'm noticing is that the newline character does not exist. Therefore, logic further down in the program sees many columns instead of the desired 3. Some of the reading that I've done suggests that the contemporary ways for creating .CSV files do not use the newline character \n, favoring the carriage return \r instead.

Implementing a RegExp to look for (and replace) the carriage return also doesn't seem to work around the issue.

CodePudding user response:

I did wind up using a RegExp to resolve the issue, but also injected a newline \n character into the array that resulted from the call to Utilities.parseCsv(). My feeling is that there is a more efficient method, but this does work:

          if((j == 0) && (!flg_includeHeader)) {
        //  importCSVintoBigQuery(firstFileData, 1, theProjectId, theDataSetId, theTableId)
            j  
          }
          else {
            t_record_sub.push([t_db_date],[firstFileDataCSV[j][1]],[firstFileDataCSV[j][2]]   "\n")

          }

      }
      var regex = new RegExp(('\n,'),'gi')
      t_record.push([t_record_sub])

      theBlob = Utilities.newBlob(t_record.toString().replace(regex, '\n'), 'application/octet-stream')

CodePudding user response:

Perhaps you can use just:

t_record_sub.push([t_db_date, firstFileDataCSV[j][1], firstFileDataCSV[j][2]])

instead of:

t_record_sub.push([t_db_date],[firstFileDataCSV[j][1]],[firstFileDataCSV[j][2]]   "\n")

Then, probably, you don't need to make the replaces with regexp.

If you want to make a string from 2D array it can be something like this:

var string = t_record_sub.map(x => x.join(',')).join('\n');

Here is the snippet to convert the flat array to the 2D array (3 columns) and then to the string:

var flat_array = [1,2,3,4,5,6,7,8,9];

var t_record_sub = [];
var i = 0;
while (i < flat_array.length) {
    t_record_sub.push([flat_array[i  ], flat_array[i  ], flat_array[i  ]]);
}

var string = t_record_sub.map(x => x.join(',')).join('\n');

console.log(string);

// output:
// 1,2,3
// 4,5,6
// 7,8,9

Or even simpler: directly from the flat array to the string:

var array = [1,2,3,4,5,6,7,8,9];

var i = 0;
var string = '';
while (i < flat_array.length) {
    string  = array[i  ]   ','   array[i  ]   ','   array[i  ]   '\n'
}

console.log(string.slice(0,-1)); // slice to remove the last '\n'

// the same output

  • Related