Home > Software engineering >  Change the entry of the pdf file to pass the exact path to it by me
Change the entry of the pdf file to pass the exact path to it by me

Time:04-15

I found this code which, you select a pdf file in an input, and it returns the number of pages it has. It turns out that with this way of reading pdfs is the only one I have found that reads absolutely all pdfs correctly.

What I am trying to do is to isolate the code that reads the pdf file, so that I can pass it the path to the file instead of using the input. It is to then read all the files in a folder and display the total number of pages.

But I can't figure out where exactly I would have to pass the path to the pdf file.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>PDF.js Example to Count Number of Pages inside PDF Document</title>
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css">
</head>
<body>
    <div >
        <h1 >Count Pages inside PDF Document</h1>
    <div >
        <input type="file" accept=".pdf" required id="files" >
    </div>
    <br><br>
    <h1  id="result"></h1>
    </div>
</body>
<script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.12.313/pdf.min.js"></script>
<script>
 
let inputElement = document.getElementById('files')
 
   inputElement.onchange = function(event) {
 
    var file = event.target.files[0];
 
    //Step 2: Read the file using file reader
    var fileReader = new FileReader();  
 
    fileReader.onload = function() {
 
        //Step 4:turn array buffer into typed array
        var typedarray = new Uint8Array(this.result);
 
        //Step 5:pdfjs should be able to read this
        const loadingTask = pdfjsLib.getDocument(typedarray);
        loadingTask.promise.then(pdf => {
 
            document.getElementById('result').innerHTML = "The number of Pages inside pdf document is "   pdf.numPages
            // The document is loaded here...
        });
                    
 
    };
    //Step 3:Read the file as ArrayBuffer
    fileReader.readAsArrayBuffer(file);
}
</script>
</html>

CodePudding user response:

You need 2 modifications to make it work. Add "multiple" attribute to the input to allow the user to select multiple pdf files.

  <input type="file" multiple accept=".pdf" required id="files" >

And then loop through the array of files to calculated the number of pages in each:

[].forEach.call(event.target.files, file => {

Update: Two additional changes have been added.

1. We must reset the file input at the end of the loop. Otherwise it will only work once and then stop.

// clear file selector to allow reuse
event.target.value = "";  

2. We also must set the value "workerSrc" to prevent a console warning message. More details about that here.

pdfjsLib.GlobalWorkerOptions.workerSrc = '//cdnjs.cloudflare.com/ajax/libs/pdf.js/2.7.570/pdf.worker.min.js';

Run the code snippet to see how it works (hold shift key down to select multiple pdf files):

let inputElement = document.getElementById('files')

inputElement.onchange = function(event) {

  [].forEach.call(event.target.files, file => {

    //var file = event.target.files[i];

    //Step 2: Read the file using file reader
    var fileReader = new FileReader();

    fileReader.onload = function() {

      //Step 4:turn array buffer into typed array
      var typedarray = new Uint8Array(this.result);

      //Step 5:pdfjs should be able to read this
      const loadingTask = pdfjsLib.getDocument(typedarray);
      loadingTask.promise.then(pdf => {

        document.getElementById('result').innerHTML  = "<li>"   file.name   " has "   pdf.numPages   "pages</li>";
        // The document is loaded here...
      });


    };
    //Step 3:Read the file as ArrayBuffer
    fileReader.readAsArrayBuffer(file);

  })

  // clear file selector to allow reuse
  event.target.value = "";  

}

// Must set worker to avoid error: Deprecated API usage: No "GlobalWorkerOptions.workerSrc" specified.

pdfjsLib.GlobalWorkerOptions.workerSrc = '//cdnjs.cloudflare.com/ajax/libs/pdf.js/2.7.570/pdf.worker.min.js';
<div >
  <h4 >Count Pages inside PDF Document</h4>
  <div >
    <input type="file" multiple accept=".pdf" required id="files" >
  </div>
  <br><br>
  <ol  id="result"></ol>
</div>



<script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.7.570/pdf.min.js" integrity="sha512-g4FwCPWM/fZB1Eie86ZwKjOP yBIxSBM/b2gQAiSVqCgkyvZ0XxYPDEcN2qqaKKEvK6a05 IPL1raO96RrhYDQ==" crossorigin="anonymous" referrerpolicy="no-referrer"></script>


<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css">

CodePudding user response:

You can't.

Browsers don't let you access local paths on a user's computer for security reasons.

The browser doesn't get to know that the pdf is at /home/USERNAME/confidentialdocs/file.pdf, it just gets a data blob with a given filename.

  • Related