Home > Software design >  Use google apps script to extract text from pdf file found at web address and and insert it into a G
Use google apps script to extract text from pdf file found at web address and and insert it into a G

Time:11-15

In the example below I left the folder and ss blank.

Idea is to retrieve the number after the text "Emerging Markets (" found in the file at the url specified in the code and then insert it into cell b2 in the google sheet specified.

Not getting any errors, but code is not working. Would appreciate your help. Novice here.

Thanks!

const FOLDER_ID = ""; //Folder ID of all PDFs
const SS = "";//The spreadsheet ID
const SHEET = "MSCI";//The sheet tab name


function OpenFile() {
 var url = "https://www.yardeni.com/pub/mscipe.pdf";
  
  var blob = UrlFetchApp.fetch(url).getBlob();
  var resource = {
    title: blob.getName(),
    mimeType: blob.getContentType()
  };

  // Enable the Advanced Drive API Service
  var file = Drive.Files.insert(resource, blob, {ocr: true, ocrLanguage: "en"});

  // Extract Text from PDF file
  var doc = DocumentApp.openById(file.id);
  var text = doc.getBody().getText();
  
return text;

const identifier = {
    start: `Emerging Markets (`,
    start_include: false,
    end: `)`,
    end_include: false
  };
  let results = getDocItems(docID, identifier);
  return results;
  }

function importToSpreadsheet(results){
  const sheet = SpreadsheetApp.openById(SS).getSheetByName(SHEET);
 
  var cell = sheet.getRange("B2");
  cell.setValue(results);
}
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

CodePudding user response:

I see two functions: OpenFile() and importToSpreadsheet(results), but I see no lines where the functions are called.

Just a guess. Perhaps you need to add at the end of your code this line:

importToSpreadsheet(OpenFile());

Update

The OpenFile() function gets you all the text. If you need only the part of the text between 'Emerging Markets (' and ')' you can cut it out this way:

var text = OpenFile(); // all text
var part = text.split('Emerging Markets (')[1].split(')')[0]; // a part between 'Emerging Markets (' and ')'
importToSpreadsheet(part); // put the part in the cell

The lines from const identifier = {... to ...return results; are redundant. Probably they were taken from another sample and don't belong this code.

  • Related