how to run pdfjs in a lambda?-CodePudding

using a pdfjs library to extract text from a pdf (sample code below) . how to run this code in lambda with a path as a s3 location. will have to read the file as bytes , first and how to pass it to pdfjs library?

async function getText(path) {
    let doc = await pdfjsLib.getDocument(path).promise;
    let page = await doc.getPage(1);
    let content = await page.getTextContent();
    let text_content = content.items.map(function(item) {
        return item.str;
    });
    return text_content;
}
(async() => {
  await getText('./file.pdf').then(data=> console.log(data));  
})()

CodePudding user response：

It's quite easy.

First you have to authorize your Lambda to access your previously created bucket.

https://aws.amazon.com/premiumsupport/knowledge-center/lambda-execution-role-s3-bucket/?nc1=h_ls

You can use the S3 Javascript SDK. I use Python so I can't give you the exact code, but it should be stratightforward.

https://docs.aws.amazon.com/es_es/sdk-for-javascript/v2/developer-guide/s3-examples.html