Just a simple question as titled; I have a pdf file which only contains text then I want to load it and extract text.
CodePudding user response:
I believe your goal is as follows.
- You want to retrieve the text data from PDF data including only texts using Google Apps Script.
In this case, how about the following flow?
- Convert PDF to Google Document using Drive API as a temporal file.
- Export text from the created Google Document.
When this is reflected in a script, it becomes as follows.
Sample script:
In this sample, Drive API is used. So, before you test this script, please enable Drive API at Advanced Google services.
function myFunction() {
const fileId = "###"; // Please set the file ID of PDF file on Google Drive.
// Convert PDF to Google Document.
const docId = Drive.Files.copy({title: "temp", mimeType: MimeType.GOOGLE_DOCS}, fileId).id;
// Retrieve text from Google Document.
const text = DocumentApp.openById(docId).getBody().getText();
// If you want to remove the template Google Document, please run this script.
// Drive.Files.remove(docId);
console.log(text); // You can see the retrieved text in the log.
// DriveApp.createFile("sample.txt", text); // If you want to save the text as a file, please use this line.
}