Context
I have a pod with two containers:
main
whose simple job is to display the content of a directorysidecar
whose responsibility is to synchronize the content of a blob storage into a predefined directory
In order for the synchronization to be atomic, sidecar
download the blob storage content into a new temp directory and then switch a symlink in the target directory.
The target directory is shared between the two containers using an emptyDir
volume.
Problem
main
has the symlink but cannot list the content sitting behind.
Question
How to access the latest synchronized data?
Additional information
Reason
I try to achieve what is being done by Apache Airflow with Git-Sync but, instead of using Git, I need to synchronize files from an Azure Blob storage. This is necessary because (1) my content is mostly dynamic and (2) the azureFile
volume type has some serious performance issues.
Sync routine
declare -r container='https://mystorageaccount.dfs.core.windows.net/mycontainer'
declare -r destination='/shared/container'
declare -r temp_dir="$(mktemp -d)"
azcopy copy --recursive "$container/*" "$temp_dir"
declare -r temp_file="$(mktemp)"
ln -sf "$temp_dir" "$temp_file"
mv -Tf "$temp_file" "$destination"
What we end up with:
$ ls /shared
container -> /tmp/tmp.doGz2U0QNy
$ ls /shared/container
file1.txt file2.txt
Solution
My initial attempt had two mistakes:
- The symlink target was not present in the volume
- The symlink target pointed to an absolute path in the sidecar container so, from the point of view of the main container, the folder did not exist
Here is the routine revised:
declare -r container='https://mystorageaccount.dfs.core.windows.net/mycontainer'
declare -r destination='/shared/container'
declare -r cache_dir="$(dirname $destination)"
declare -r temp_dir="$(mktemp -d -p $cache_dir)"
azcopy copy --recursive "$container/*" "$temp_dir"
ln -sf "$(basename $temp_dir)" "$cache_dir/symlink"
mv -Tf "$cache_dir/symlink" "$destination"
CodePudding user response:
A symlink is just a special kind of file that contains a filename; it doesn't actually contain the file content in any meaningful way, and it doesn't have to point to a file that exists. mktemp(1) by default creates directories in /tmp
, which probably isn't in the shared volume.
Imagine putting a physical file folder in a physical file cabinet, writing the third drawer at the very front
on a Post-It note, and driving to another building, and handing the note to a colleague. The Post-It note (the symlink) still exists, but in the other building's (container filesystem's) context, the location it names isn't especially meaningful.
The easiest way around this is to ask mktemp
to create the file directly in the destination volume, and then create a relative-path symlink.
# extract the volume location (you may already have this)
volume_dir=$(dirname "$destination")
# force the download location to be inside the volume
# (mktemp --tmpdir option)
temp_dir=$(mktemp -d --tmpdir "$volume_dir")
# actually do the download
azcopy copy --recursive "$container/*" "$temp_dir"
# set the symlink to a relative-path symlink, since the directory
# and the link are in the same place; avoids problems if the volume
# is mounted in different places in the two containers
ln -sf $(basename "$temp_dir") "$destination"