Home > OS >  Convert PDF to Text and Push to a Google Sheet using Apps Script
Convert PDF to Text and Push to a Google Sheet using Apps Script

Time:11-16

I receive a PDF report weekly that I would like to automate into a Google Sheet Workbook using App Script.

I was able to successfully pull the PDF from my Gmail account and extract the text I need but I am having trouble taking the "Table" from the PDF and writing it to Google Sheets.

I used a split function to extract only what I need, and create an "extract" variable that I would like to slice up and use to reconstruct in google sheets. The table extract comes out like this:

Groups Team A 1,058 1,058 Team B 757 757 Team C 179 6,830 7,009 Team D 1,359 1,359 Total 179 10,004 10,183

Ideally I would like the table to look like this in google sheets:

Teams   X   Y   Total
Team A  1,058       1,058
Team B  757 757 1,514
Team C  179 6,830   7,009
Team D  1,359       1,359
Total   179 10,004  10,183
            

Primary Question: How can I construct my text extract into an Array that I can push / write to google sheets?

Below is the script I have so far, but I can't seem to write the data to google sheet using the sheet.getrange.setvalues(data). I was expecting to at least be able to push the text to google sheets so maybe I could manipulate it there with formulas.

I receive an error "TypeError: sheet.getrange is not a function". Referring to my primary question above, do I need to reformat the data in a way that can write out correctly to google sheets?

``


function importPDFFromGmail() {
 var threads = GmailApp.search("TEST PDF PULL"); // enter search criteria here
  var messages = threads[0].getMessages();
  var message = messages[messages.length - 1];  // Takes the last thread in the email chain
  var attachment = message.getAttachments()[0];  

  // Is the attachment a PDF file
      attachment.setContentTypeFromExtension();
      if (attachment.getContentType() === "application/pdf") {
          var sheet = SpreadsheetApp.openById('<Sheet ID>').getSheetByName('PDF TEST');
        
         var blob = attachment.getAs(MimeType.PDF);  
         var filetext = pdfToText( blob, {keepTextfile: false} );
         var extract = filetext.split('Sum')[1].split('Sum')[0]; //  part between 'Sum' and 'Sum'
         
         var lines = extract.split("\n");
         // var extract_delimited = lines.map(function(lin){return lin.split(" ")}); 
         
         var data = lines.map(function(lin){return lin.split(" ")});  
   
        
         sheet.clear();
         sheet.getrange(1,1,data.length,data[0].length).setValues(data);
      }
          //  console.log("extract");
          // console.log(extract)
          //  console.log("lines");
          // console.log(lines)
          //  console.log("extract delimited");
          // console.log(extract_delimited;          
};

``

CodePudding user response:

First fix the typo in sheet.getRange().

Your String.split() call will get rows that vary in length, while Range.setValues() expects a 2D array where every row is of the same length. You can set the length of all rows to the common max value like this:

  const numColumns = Math.max(...data.map(row => row.length));
  data.forEach(row => row.length = numColumns);
  sheet.getRange('A1')
    .offset(0, 0, data.length, numColumns)
    .setValues(data);
  • Related