I would like to write a BASH/ZSH script to create a series of input files to run calculations based on a series of molecular coordinate files which have been generated for me. Do do this, I would like to copy the coordinates into a template file and then place each resulting input file into a separate directory.
The coordinate files all take the form:
16
Coordinates from ORCA-job tma.h2s-2-vpt2-b97m-d4-qz_DH001
N -1.01856437662682 -0.06753190029287 -0.03381525476330
C -1.19660954482440 1.36662477704039 0.00584753001945
H -0.52879789886132 1.79811481102372 0.74381766556842
H -2.22460413600866 1.65299630904871 0.26354625406395
H -0.95890188582374 1.79386241616540 -0.96233649351010
C -1.28542658079723 -0.65492826277353 1.25977378002066
H -2.31817759691074 -0.48376502304456 1.59096903742337
H -0.61568119318365 -0.23019270431631 1.99961200111689
H -1.11650787130907 -1.72558477449900 1.21976548924500
C -1.84182993191365 -0.66637076429011 -1.05941450368454
H -1.59318377282657 -0.24161533539115 -2.02568837580639
H -2.91364004980661 -0.50657782170270 -0.88078168276917
H -1.66188430154905 -1.73508690928373 -1.09888083377706
H 1.05164190689465 -0.14079517174876 -0.18978887741952
S 2.39061532747941 0.04713819635625 -0.09264157716239
H 2.60455667714639 -1.21543465473706 0.28761295155857
where the first line is the number of atoms in the molecule, the second line is a comment line, and the remainder are cartesian coordinates for each atom. All of these files are named <prefix>_DH###.xyz
, where the ###
are numbers (in the above example, 001
), and the prefix is just the name of the file (in the above example, it's tma.h2s-2-vpt2-b97m-d4-qz
). I have ~80 of these coordinate files (so it's not really practical to do it manually), where each of these files contains coordinates for the same molecule, but with a slightly different geometry.
I would like to copy the coordinates in each of these files into a template file, which looks something like this:
#H2S-TMA, Displacement ###: B97M-D4/AVTZ
!B97M-D4 verytightscf verytightopt freq DefGrid3 NoRI Mass2016 UseSym
%method
functional mgga_xc_b97m_v
end
etc etc etc
%pal
nprocs 4
end
* xyz 0 1
$coords
*
where the coordinates would be copied into the space labelled $coords. I'd also like it for each of these files to be called <prefix>_D###.inp
, similar to the original coordinate files, and to be copied into directories called D###
, with all the numbers matching the original coordinate file's name. (So, a coordinate file tma_DH001.xyz
would produce ./D001/tma_D001.inp
- the naming and numbering of each resulting input file is very important)
So far, I have been able to generate each of the new directories, but I can't figure out how to copy the contents of all of the coordinates into the template. I have asked a similar question in the past (Script to copy all text from one file into template file), but that was for splitting the contents of one file into multiple new files; I'm not sure how to copy coordinates from multiple files into the template.
I imagine this will require an awk
or a sed
command, or even cat
, but I don't understand awk
(it's another language entirely!), sed seems to only copy individual lines, and I'm not sure how to direct the cat
output into a specific spot in the template file. What I have been able to come up with so far is:
#!/bin/zsh
input_file=$1
template=$2
#Determine number of atoms
n_atoms=$(sed -n 1p $1)
#determine number of displacements
n_displacements=$(ls -1 *.xyz | wc -l)
#Create directories of the form D000
start_index=1
end_index="$n_displacements"
for ((i=start_index; i<=end_index; i )); do
# format a dirpath with the 3-digits index
printf -v dirpath 'Dd' $i
mkdir -p -- "$dirpath"
done
#Copy coordinates into template file
for file in *${n}.xyz
do sed -i -e 1,2d $file >> $TEMPFILE
do cat $TEMPFILE >> ???
(I've added the $input_file
variable so that I can find out the number of atoms - because all of the coordinate files are of the same molecule, $n_atoms
will always be the same. I thought obtaining the number of atoms might be important to getting this script to work, but now I'm not so sure...)
It's the last part I can't figure out...
This will be the second script I've ever tried to write from scratch, so I am very inexperienced with all of this... Any help would be greatly appreciated!
CodePudding user response:
One option for template processing is the zsh
(e)
parameter expansion flag:
#!/usr/bin/env zsh
local template=${1?Template filename must be specified.}
local baseDir=${2:-'.'}
if [[ ! -f $template ]]; then
print "Template file not found: $template"
exit 1
fi
# Store the contents of the template in a variable.
local templateText=$(<${template})
local coordFile
for coordFile in $baseDir/*.xyz; do
print "Processing coordinate file: $coordFile"
# File name is in the form '....Hnnn.xyz';
# this will parse nnn from that name.
local inputNumber=$coordFile
# Remove '.xyz'.
inputNumber=${inputNumber%.xyz}
# Remove everything up to and including the 'H'.
inputNumber=${inputNumber##*H}
# Subdirectory name is based on the input number.
local outDir=$baseDir/D$inputNumber
# Create the directory if it doesn't exist.
if [[ ! -d $outDir ]]; then
mkdir $outDir
fi
# Output filename is similar to the input name,
# with a different extension.
local outFile=$outDir/${coordFile:t:r}.inp
# Remove the last 'H' in the filename, to match example
# in the question (which may have been a typo).
outFile=${outFile%H*}${outFile##*H}
print "Creating output file: $outFile"
# Store the coordinate information in a variable.
# This needs to match the variable name in the template.
local coords=$(tail -n 3 $coordFile)
# Replace the variable '$coords' in the template with
# the text from the coordinate file, using the
# (e) parameter expansion flag.
print ${(e)templateText} > $outFile
# Another option: the gnu envsubst utility.
# It's not available on every system.
# export coords=$(tail -n 3 $coordFile)
# envsubst < $template > $outFile
done
This uses several different parameter expansions; they are documented here. I'm not sure I completely followed how the files and filenames are laid out, so you may need to adapt this.
Note that template substitution can introduce some security risks, so this kind of code is only suitable for files from known processes and sources.