I found this code which, you select a pdf file in an input, and it returns the number of pages it has. It turns out that with this way of reading pdfs is the only one I have found that reads absolutely all pdfs correctly.
What I am trying to do is to isolate the code that reads the pdf file, so that I can pass it the path to the file instead of using the input. It is to then read all the files in a folder and display the total number of pages.
But I can't figure out where exactly I would have to pass the path to the pdf file.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>PDF.js Example to Count Number of Pages inside PDF Document</title>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css">
</head>
<body>
<div >
<h1 >Count Pages inside PDF Document</h1>
<div >
<input type="file" accept=".pdf" required id="files" >
</div>
<br><br>
<h1 id="result"></h1>
</div>
</body>
<script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.12.313/pdf.min.js"></script>
<script>
let inputElement = document.getElementById('files')
inputElement.onchange = function(event) {
var file = event.target.files[0];
//Step 2: Read the file using file reader
var fileReader = new FileReader();
fileReader.onload = function() {
//Step 4:turn array buffer into typed array
var typedarray = new Uint8Array(this.result);
//Step 5:pdfjs should be able to read this
const loadingTask = pdfjsLib.getDocument(typedarray);
loadingTask.promise.then(pdf => {
document.getElementById('result').innerHTML = "The number of Pages inside pdf document is " pdf.numPages
// The document is loaded here...
});
};
//Step 3:Read the file as ArrayBuffer
fileReader.readAsArrayBuffer(file);
}
</script>
</html>
CodePudding user response:
You need 2 modifications to make it work. Add "multiple" attribute to the input to allow the user to select multiple pdf files.
<input type="file" multiple accept=".pdf" required id="files" >
And then loop through the array of files to calculated the number of pages in each:
[].forEach.call(event.target.files, file => {
Update: Two additional changes have been added.
1. We must reset the file input at the end of the loop. Otherwise it will only work once and then stop.
// clear file selector to allow reuse
event.target.value = "";
2. We also must set the value "workerSrc" to prevent a console warning message. More details about that here.
pdfjsLib.GlobalWorkerOptions.workerSrc = '//cdnjs.cloudflare.com/ajax/libs/pdf.js/2.7.570/pdf.worker.min.js';
Run the code snippet to see how it works (hold shift key down to select multiple pdf files):
let inputElement = document.getElementById('files')
inputElement.onchange = function(event) {
[].forEach.call(event.target.files, file => {
//var file = event.target.files[i];
//Step 2: Read the file using file reader
var fileReader = new FileReader();
fileReader.onload = function() {
//Step 4:turn array buffer into typed array
var typedarray = new Uint8Array(this.result);
//Step 5:pdfjs should be able to read this
const loadingTask = pdfjsLib.getDocument(typedarray);
loadingTask.promise.then(pdf => {
document.getElementById('result').innerHTML = "<li>" file.name " has " pdf.numPages "pages</li>";
// The document is loaded here...
});
};
//Step 3:Read the file as ArrayBuffer
fileReader.readAsArrayBuffer(file);
})
// clear file selector to allow reuse
event.target.value = "";
}
// Must set worker to avoid error: Deprecated API usage: No "GlobalWorkerOptions.workerSrc" specified.
pdfjsLib.GlobalWorkerOptions.workerSrc = '//cdnjs.cloudflare.com/ajax/libs/pdf.js/2.7.570/pdf.worker.min.js';
<div >
<h4 >Count Pages inside PDF Document</h4>
<div >
<input type="file" multiple accept=".pdf" required id="files" >
</div>
<br><br>
<ol id="result"></ol>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.7.570/pdf.min.js" integrity="sha512-g4FwCPWM/fZB1Eie86ZwKjOP yBIxSBM/b2gQAiSVqCgkyvZ0XxYPDEcN2qqaKKEvK6a05 IPL1raO96RrhYDQ==" crossorigin="anonymous" referrerpolicy="no-referrer"></script>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css">
CodePudding user response:
You can't.
Browsers don't let you access local paths on a user's computer for security reasons.
The browser doesn't get to know that the pdf is at /home/USERNAME/confidentialdocs/file.pdf
, it just gets a data blob with a given filename.