Home > OS >  Move downloaded files from sftp to custom path in s3 bucket
Move downloaded files from sftp to custom path in s3 bucket

Time:10-19

We download loads of raw files everyday and wanted to sort files based on names and move them to respective paths in a s3 bucket.

For example, all files that start with FOO_ _ .csv needs to be moved to the path s3://bucket_name/test/FOO and the ones that start with BAR _ .csv to s3://bucket_name/test/BAR.

After a lot of research I ended up with the following but now all the files are moved to both s3://bucket_name/test/FOO and s3://bucket_name/test/BAR. I'm clearly missing some logic but unsure what. Please suggest.

#!/usr/bin/env bash
SFTP="sftp user@sftpserver"
FOLDER="/data"
TARGET="/home/completed"
DEST="FOO BAR"
S3_PREFIX="s3://bucket_name/test/"

FILES=`$SFTP <<EOF
cd $FOLDER
ls
EOF`
FILES=`echo $FILES|sed "s/.*sftp> ls//"` 

(
 echo cd $FOLDER
 for F in $FILES; do
   echo get $F $TARGET
 done
) | $SFTP
for dest in $DEST; do
 ldir="$LOCAL_PREFIX/$dest"

 aws s3 cp $TARGET $S3_PREFIX/$dest --recursive --exclude "*" --include "*.csv"
done

CodePudding user response:

Your script is performing a recursive s3 copy to two different s3 'paths'. You would need to check each filename to know which specific s3 prefix to utilize.

One possibility would be to use find to locate the FOO/BAR .csv files to copy and then check each file to determine which s3 prefix to utilize.

Something like:

#!/bin/bash

src_dir=${1:-/tmp/test}
s3_prefix="s3://bucket_name/test/"

while read -r -d '' line ; do
    if [[ $(grep -c 'FOO' <<<${line}) -eq 1 ]] ; then
        echo "copying ${line} to ${s3_prefix}/FOO${line}"
        #aws s3 cp "${line}" "${s3_prefix}/FOO${line}"
    else 
        echo "copying ${line} to ${s3_prefix}/BAR${line}"
        #aws s3 cp "${line}" "${s3_prefix}/BAR${line}"
    fi
done < <(find "${src_dir}" \( -name 'FOO*.csv' -o -name 'BAR*.csv' \) -print0 )

Given a directory structure and files like:

/tmp/test/src/one:
BAR_one.csv BAR_one.txt FOO_one.csv FOO_one.txt

/tmp/test/src/three:
BAR_three.csv FOO_three.csv junk.txt      test.tmp

/tmp/test/src/two:
BAR_two.csv FOO_two.csv junk.csv

Sample output would be:

copying /tmp/test/src/three/FOO_three.csv to s3://bucket_name/test//FOO/tmp/test/src/three/FOO_three.csv
copying /tmp/test/src/three/BAR_three.csv to s3://bucket_name/test//BAR/tmp/test/src/three/BAR_three.csv
copying /tmp/test/src/one/FOO_one.csv to s3://bucket_name/test//FOO/tmp/test/src/one/FOO_one.csv
copying /tmp/test/src/one/BAR_one.csv to s3://bucket_name/test//BAR/tmp/test/src/one/BAR_one.csv
copying /tmp/test/src/two/BAR_two.csv to s3://bucket_name/test//BAR/tmp/test/src/two/BAR_two.csv
copying /tmp/test/src/two/FOO_two.csv to s3://bucket_name/test//FOO/tmp/test/src/two/FOO_two.csv

It would probably be worthwhile to add some sanity checking for filenames to ensure compliance with s3 object naming conventions. Further, shellcheck is a great resource to check your script for errors.

  • Related