Home > Mobile >  Script to copy all text from one file into template file
Script to copy all text from one file into template file

Time:10-29

I'm extremely new to scripting (this will be the first I've ever written on my own), and I'm struggling...

Essentially, a program I'm using outputs a multiple lists of atomic cartesian coordinates for a molecule, where each set of coordinates contains a slightly different geometry. I then want to run calculations on this molecule at each of these geometries in another program. To do this, I need to create input files containing these coordinates based on a template file, and am hoping to do this via a bash or zsh script.

The first program outputs all geometries in a single file, of the form:

  13
        -15.02035015
 C          3.0629012683       -0.1237662359       -0.0004161296
 C          1.5725410176       -0.4599705612       -0.0010537192
 H          3.6545324244       -1.0351015878       -0.0040975574
 H          3.3232895577        0.4531937573        0.8855087768
 H          3.3225341254        0.4598595336       -0.8822056347
 N          0.6972014643        0.7054585380        0.0017284824
 H          1.3274001069       -1.0545725774        0.8830977697
 H          1.3271390154       -1.0504225891       -0.8878762403
 H          0.8745667924        1.2800026498       -0.8166554074
 H          0.8753847581        1.2767560879        0.8221982135
 S         -2.4024384670       -0.0657095889       -0.0009217321
 H         -2.1207044390       -1.3609141502        0.0227283569
 H         -1.0945221361        0.2739471520        0.0001162389
  13
        -15.02029090
 C          3.0458878237       -0.1642767706       -0.0538270794
 C          1.5490175255       -0.4572401536        0.0316764611
 H          3.3628431459        0.4546044246        0.7845264240
 H          3.2796163460        0.3602842378       -0.9790015411
 H          3.6124852940       -1.0910645341       -0.0311065021
 N          0.7057821467        0.7323073404        0.0100678359
 H          1.3291157247       -0.9968212951        0.9565729700
 H          1.2449884019       -1.0864558318       -0.8086148493
 H          0.8643815373        1.2571851525       -0.8447936589
 H          0.9361625337        1.3407384060        0.7900308086
 S         -2.3784808925       -0.1009812166       -0.0319557326
 H         -2.4637876581       -0.0476175701        1.2900767837
 H         -1.0744168237        0.2509451631       -0.0171658709
etc...

essentially, one line where the number of atoms is written (always the same number within a file, but will depend on the molecule you are interested in [3 if I'm looking at water, H2O; 5 for methane, CH4; 4 for ammonia, NH3; etc.]), one comment line (in the case of this program, the energies are written there) and then the cartesian coordinates, followed directly by the next set of coordinates. In my test file, there are 49 sets of coordinates.

The template file will look something like this:

#Comment line, molecule number CONF_NUMBER
#
!B97M-D4 verytightscf verytightopt freq DefGrid3 NoRI Mass2016 UseSym
%method
functional mgga_xc_b97m_v
end
etc etc etc...

* xyz 0 1
COORDINATES
*

So, ideally I would end up writing a script which would take the coordinates of each molecule from the coordinates file and generate an input file for every listed geometry based on the template, replacing the COORDINATES text in the template with the one of the geometries (and, if possible, to include a number in the first comment line, replacing CONF_NUMBER with a number matching the directory and file name):

~/c1/molecule-c1-name.inp:

#Comment line, molecule number 1
#
!B97M-D4 verytightscf verytightopt freq DefGrid3 NoRI Mass2016 UseSym
%method
functional mgga_xc_b97m_v
end
etc etc etc...

* xyz 0 1
O 1.0 23.21 1.1
H 2.0 2.90 1.1
H 3.0 2.33 1.1
*

~/c2/molecule-c2-name.inp:

#Comment line, molecule number 2
#
!B97M-D4 verytightscf verytightopt freq DefGrid3 NoRI Mass2016 UseSym
%method
functional mgga_xc_b97m_v
end
etc etc etc...

* xyz 0 1
O 2.0 23.21 1.1
H 3.0 2.70 1.43
H 2.0 2.33 1.1
*

etc...

So far, I've been able to break up the individual geometries into separate each geometry into individual coordinate files and even remove the two lines above each geometry (which are not needed). Unfortunately, I cannot find a way to copy the whole geometry into the template; I'm stuck in a position where only a single line from the coordinate file is copies over per input file. The code I've got so far is:

#!/bin/zsh

input_file=$1


#arguments
while getopts t:j:p:s: flag
do
    case "${flag}" in
        t) template_file=${OPTARG};;
        j) job_file=${OPTARG};;
        p) prefix=${OPTARG};;
        s) suffix=${OPTARG};;
    esac
done

#Determine number of atoms
n_atoms=$(sed -n 1p $1)

#Determine number of lines to separate
splitlines=$(echo $n_atoms | awk '{print ($0 2)}')


#determine number of conformers
n_conformers=$(grep -c "$n_atoms" $1)

#echo $splitlines

#Split the coordinates into individual .xyz files
split -dl $splitlines $1 coords

#Rename coordinate files
for file in coords*
do
    sed -i -e 1,2d $file
    mv "$file" "$file.xyz"
done
rm *-e

#Copy coordinates into template file
n=1
while read file
do
  sed -i "" "s/COORDINATES/"$file"/r" template.inp > ea.h2s-input${n}.inp
((n  ))
done < coords00.xyz #first coordinate file produced

In this example, coords00.xyz is the first of the separated coordinate files the script generates, template.inp is the template file, and ea.h2s-input${n}.inp is the name of the resulting input file (which I will later make customisable with arguments, hopefully).

Bear in mind that, during testing, I've been trying to get the simple things working, so this script is only written to get the first geometry copied into the template file (hence why the files are named explicitly, rather than as variables - although I'm hoping to use the arguments at the beginning of the script to help name each resulting input file), but I can't even get that to work!

Unfortunately, all other forum posts I've found only talk about copying small bits of text (a name, a word, one line of text) into templates - never multiple lines, let alone the entire contents of a file.

I have tried everything I can think of to get this to work, and this script is as close as I have gotten, but I cannot figure out how to print all of the coordinate lines into the template. Any help would be greatly appreciated!

CodePudding user response:

If you are open to using awk:

awk -v tmpl="$(<src.tmpl)" 'BEGIN{cnt=1} \
NR==1 {n_atoms=$1} \
NF==1 {flag=1; close(out); out=sprintf("coords""d"".xyz", cnt)} \
{if(flag==0) {print $0 > out; coord_cnt  }} \
{if(coord_cnt==n_atoms){printf "*\n" > out; coord_cnt=0}}\
{if (NF==1 && $1!=n_atoms) { flag=0; \
printf "#Comment line, molecule number %s\n%s\n", cnt, tmpl > out; cnt =1}}' src.txt

template file contents should look like: (newlines escaped)

# \
!B97M-D4 verytightscf verytightopt freq DefGrid3 NoRI Mass2016 UseSym \
%method \
functional mgga_xc_b97m_v \
end \
etc etc etc... \
\
* xyz 0 1

src.txt contents:

13
        -15.02035015
 C          3.0629012683       -0.1237662359       -0.0004161296
 C          1.5725410176       -0.4599705612       -0.0010537192
 H          3.6545324244       -1.0351015878       -0.0040975574
 H          3.3232895577        0.4531937573        0.8855087768
 H          3.3225341254        0.4598595336       -0.8822056347
 N          0.6972014643        0.7054585380        0.0017284824
 H          1.3274001069       -1.0545725774        0.8830977697
 H          1.3271390154       -1.0504225891       -0.8878762403
 H          0.8745667924        1.2800026498       -0.8166554074
 H          0.8753847581        1.2767560879        0.8221982135
 S         -2.4024384670       -0.0657095889       -0.0009217321
 H         -2.1207044390       -1.3609141502        0.0227283569
 H         -1.0945221361        0.2739471520        0.0001162389
  13
        -15.02029090
 C          3.0458878237       -0.1642767706       -0.0538270794
 C          1.5490175255       -0.4572401536        0.0316764611
 H          3.3628431459        0.4546044246        0.7845264240
 H          3.2796163460        0.3602842378       -0.9790015411
 H          3.6124852940       -1.0910645341       -0.0311065021
 N          0.7057821467        0.7323073404        0.0100678359
 H          1.3291157247       -0.9968212951        0.9565729700
 H          1.2449884019       -1.0864558318       -0.8086148493
 H          0.8643815373        1.2571851525       -0.8447936589
 H          0.9361625337        1.3407384060        0.7900308086
 S         -2.3784808925       -0.1009812166       -0.0319557326
 H         -2.4637876581       -0.0476175701        1.2900767837
 H         -1.0744168237        0.2509451631       -0.0171658709

Output: (note output files numbering starts with '01')

$ ls coords0*
coords01.xyz coords02.xyz

$ cat coords0*
#Comment line, molecule number 1
# 
!B97M-D4 verytightscf verytightopt freq DefGrid3 NoRI Mass2016 UseSym 
%method 
functional mgga_xc_b97m_v 
end 
etc etc etc... 

* xyz 0 1
 C          3.0629012683       -0.1237662359       -0.0004161296
 C          1.5725410176       -0.4599705612       -0.0010537192
 H          3.6545324244       -1.0351015878       -0.0040975574
 H          3.3232895577        0.4531937573        0.8855087768
 H          3.3225341254        0.4598595336       -0.8822056347
 N          0.6972014643        0.7054585380        0.0017284824
 H          1.3274001069       -1.0545725774        0.8830977697
 H          1.3271390154       -1.0504225891       -0.8878762403
 H          0.8745667924        1.2800026498       -0.8166554074
 H          0.8753847581        1.2767560879        0.8221982135
 S         -2.4024384670       -0.0657095889       -0.0009217321
 H         -2.1207044390       -1.3609141502        0.0227283569
 H         -1.0945221361        0.2739471520        0.0001162389
*
#Comment line, molecule number 2
# 
!B97M-D4 verytightscf verytightopt freq DefGrid3 NoRI Mass2016 UseSym 
%method 
functional mgga_xc_b97m_v 
end 
etc etc etc... 

* xyz 0 1
 C          3.0458878237       -0.1642767706       -0.0538270794
 C          1.5490175255       -0.4572401536        0.0316764611
 H          3.3628431459        0.4546044246        0.7845264240
 H          3.2796163460        0.3602842378       -0.9790015411
 H          3.6124852940       -1.0910645341       -0.0311065021
 N          0.7057821467        0.7323073404        0.0100678359
 H          1.3291157247       -0.9968212951        0.9565729700
 H          1.2449884019       -1.0864558318       -0.8086148493
 H          0.8643815373        1.2571851525       -0.8447936589
 H          0.9361625337        1.3407384060        0.7900308086
 S         -2.3784808925       -0.1009812166       -0.0319557326
 H         -2.4637876581       -0.0476175701        1.2900767837
 H         -1.0744168237        0.2509451631       -0.0171658709
*
  • Related