Home > Mobile >  Is there an easy way of getting the latest filename from the latest folder in s3 bucket using node.j
Is there an easy way of getting the latest filename from the latest folder in s3 bucket using node.j

Time:09-17

I have an amazon S3 bucket with the following structure.

s3
|_ Year 2020 folder
|_ Year 2021 folder
|                 |_ Jan
|                 |_ Feb
|                      |_ filename_20210201.txt
|                      |_ filename_20210204.txt 
|_ Year 2023 folder
|                 |_ Jan
|                 |_ Feb
|                 |_ Mar
|                      |_ filename_20230301.txt  

Each of the year folders have sub folders for each month of the year. There are .txt files within the month folders as well. The year and month folders are added as they are needed.

How do I to get the latest filename from the latest folder using node.js.

CodePudding user response:

this would be my approach:

just a function to check if a file is a directory:

const isDirectory = source => fs.lstatSync(source).isDirectory();

Here we order contents of a directory: first we get every file (and directory), map them so we have and array of objects with the file and creation time, then sort it.

const oderFiles = dir => {
    return fs.readdirSync(dir)
        .map((file) => ({ file, mtime: fs.lstatSync(path.join(dir, file)).ctime }))
        .sort((a, b) => b.mtime.getTime() - a.mtime.getTime());
}

pass in a directory. this function will recursively order directory contents. if the first result (newest file) is a directory it calls itself with the given directory, if its a file it will return the path to the file.

const findFile = dir => {
    const files = orderFiles(dir);
    if(files.length === 0) return undefined;
    const {file} = files[0]
    return isDirectory(file) ? findFile(path.join(dir,file)) : path.resolve(file);
}

CodePudding user response:

Your goal is to "get the latest filename and increment the date by one day".

The only true way to do this is to call ListObjects() and examine the LastModified date on each file.

If your bucket contains a large number of objects, this can be slow since ListObjects() only returns 1000 objects at a time. You could reduce the number of scanned objects by providing a Prefix so that less objects are returned. For example, if you know that year = 2023, then you could pass Prefix='2023/'.

An alternative approach would be:

  • Create a trigger on the S3 bucket that fires when a new object is created
  • Have the trigger call an AWS Lambda function that stores the Key of the object in AWS Systems Manager Parameter Store
  • Later, when you want to know the last Key that was used, you can query the Parameter Store rather than listing the S3 objects

Or, if you control that stores the objects in S3, then that code could write to Parameter Store directly, treating it like a mini-database.

  • Related