We download loads of raw files everyday and wanted to sort files based on names and move them to respective paths in a s3 bucket.
For example, all files that start with FOO_ _ .csv needs to be moved to the path s3://bucket_name/test/FOO and the ones that start with BAR _ .csv to s3://bucket_name/test/BAR.
After a lot of research I ended up with the following but now all the files are moved to both s3://bucket_name/test/FOO and s3://bucket_name/test/BAR. I'm clearly missing some logic but unsure what. Please suggest.
#!/usr/bin/env bash
SFTP="sftp user@sftpserver"
FOLDER="/data"
TARGET="/home/completed"
DEST="FOO BAR"
S3_PREFIX="s3://bucket_name/test/"
FILES=`$SFTP <<EOF
cd $FOLDER
ls
EOF`
FILES=`echo $FILES|sed "s/.*sftp> ls//"`
(
echo cd $FOLDER
for F in $FILES; do
echo get $F $TARGET
done
) | $SFTP
for dest in $DEST; do
ldir="$LOCAL_PREFIX/$dest"
aws s3 cp $TARGET $S3_PREFIX/$dest --recursive --exclude "*" --include "*.csv"
done
CodePudding user response:
Your script is performing a recursive s3 copy to two different s3 'paths'. You would need to check each filename to know which specific s3 prefix to utilize.
One possibility would be to use find to locate the FOO/BAR .csv files to copy and then check each file to determine which s3 prefix to utilize.
Something like:
#!/bin/bash
src_dir=${1:-/tmp/test}
s3_prefix="s3://bucket_name/test/"
while read -r -d '' line ; do
if [[ $(grep -c 'FOO' <<<${line}) -eq 1 ]] ; then
echo "copying ${line} to ${s3_prefix}/FOO${line}"
#aws s3 cp "${line}" "${s3_prefix}/FOO${line}"
else
echo "copying ${line} to ${s3_prefix}/BAR${line}"
#aws s3 cp "${line}" "${s3_prefix}/BAR${line}"
fi
done < <(find "${src_dir}" \( -name 'FOO*.csv' -o -name 'BAR*.csv' \) -print0 )
Given a directory structure and files like:
/tmp/test/src/one:
BAR_one.csv BAR_one.txt FOO_one.csv FOO_one.txt
/tmp/test/src/three:
BAR_three.csv FOO_three.csv junk.txt test.tmp
/tmp/test/src/two:
BAR_two.csv FOO_two.csv junk.csv
Sample output would be:
copying /tmp/test/src/three/FOO_three.csv to s3://bucket_name/test//FOO/tmp/test/src/three/FOO_three.csv
copying /tmp/test/src/three/BAR_three.csv to s3://bucket_name/test//BAR/tmp/test/src/three/BAR_three.csv
copying /tmp/test/src/one/FOO_one.csv to s3://bucket_name/test//FOO/tmp/test/src/one/FOO_one.csv
copying /tmp/test/src/one/BAR_one.csv to s3://bucket_name/test//BAR/tmp/test/src/one/BAR_one.csv
copying /tmp/test/src/two/BAR_two.csv to s3://bucket_name/test//BAR/tmp/test/src/two/BAR_two.csv
copying /tmp/test/src/two/FOO_two.csv to s3://bucket_name/test//FOO/tmp/test/src/two/FOO_two.csv
It would probably be worthwhile to add some sanity checking for filenames to ensure compliance with s3 object naming conventions. Further, shellcheck is a great resource to check your script for errors.