using a pdfjs library to extract text from a pdf (sample code below) . how to run this code in lambda with a path as a s3 location. will have to read the file as bytes , first and how to pass it to pdfjs library?
async function getText(path) {
let doc = await pdfjsLib.getDocument(path).promise;
let page = await doc.getPage(1);
let content = await page.getTextContent();
let text_content = content.items.map(function(item) {
return item.str;
});
return text_content;
}
(async() => {
await getText('./file.pdf').then(data=> console.log(data));
})()
CodePudding user response:
It's quite easy.
First you have to authorize your Lambda to access your previously created bucket.
https://aws.amazon.com/premiumsupport/knowledge-center/lambda-execution-role-s3-bucket/?nc1=h_ls
You can use the S3 Javascript SDK. I use Python so I can't give you the exact code, but it should be stratightforward.
https://docs.aws.amazon.com/es_es/sdk-for-javascript/v2/developer-guide/s3-examples.html