I have a bunch of files in directories with a file that includes important data like author and title.
/data/unorganised_texts/a-long-story
Many files in the directories, but most importantly each directory includes Data.yaml
with contents like this:
Category: Space
Author: Jôëlle Frankschiff
References:
Title: Historical
Title: Future
Title: A “long” story!
I need to match these lines as variables $category, $author, $title and make an appropriate structure and copy the directory like so:
/data/organised_texts/$category/$author/$title
Here is my attempt in bash, but probably going wrong in multiple places and as suggested would be better in python.
#!/bin/bash
for dir in /data/unorganised_texts/*/
while IFS= read -r line || [[ $category ]]; do
[[ $category =~ “Category:” ]] && echo "$category" && mkdir /data/organised_texts/$category
[[ $author ]]; do
[[ $author =~ “Author:” ]] && echo "$Author"
[[ $title ]]; do
[[ $title =~ “Title:” ]] && echo "$title" && mkdir /data/organised_texts/$category/$title && cp $dir/* /data/organised_texts/$category/$title/
done <"$dir/Data.yaml"
Here is my bash version, as I was experimenting with readarray
and command eval
and bash version was important:
ubuntu:~# bash --version
GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)
Thanks!
CodePudding user response:
One bash
idea:
unset cat auth title
while read -r label value
do
case "${label}" in
"Category:") cat="${value}" ;;
"Author:") auth="${value}" ;;
"Title:") title="${value}" ;;
esac
if [[ -n "${cat}" && -n "${auth}" && -n "${title}" ]]
then
mkdir -p "${cat}/${auth}/${title}"
# cp ... # OP can add the desired `cp` command at this point, or after breaking out of the `while` loop
break
fi
done < Data.yaml
NOTE: assumes none of the values include linefeeds
Results:
$ find . -type d
.
./Space
./Space/Jôëlle Frankschiff
./Space/Jôëlle Frankschiff/A “long” story!
CodePudding user response:
- It looks you have unmatched do-done pairs.
- The expression
[[ $varname ]]
will cause a syntax error. mkdir -p
can create directories recursively at a time.
Then would you please try the following:
#!/bin/bash
for dir in /data/unorganised_texts/*/; do
while IFS= read -r line; do # read a line of yaml file in "$dir"
[[ $line =~ ^[[:space:]] ]] && continue # skip indented (starting with a space) lines
read -r key val <<< "$line" # split on the 1st space into key and val
val=${val//\//_} # replace slash with underscore, just in case
if [[ $key = "Category:" ]]; then category="$val"
elif [[ $key = "Author:" ]]; then author="$val"
elif [[ $key = "Title:" ]]; then title="$val"
fi
done < "$dir/Data.yaml"
destdir="/data/organised_texts/$category/$author/$title" # destination directory
if [[ -d $destdir ]]; then # check the duplication
echo "$destdir already exists. skipped."
else
mkdir -p "$destdir" # create the destination directory
cp -a -- "$dir"/* "$destdir" # copy the contents to the destination
# echo "/data/organised_texts/$category/$author/$title" # remove "#" to see the progress
fi
done