I need to parse a part of my output file that looks like this (Image is also attached for clarity)
ATOM 1 c 2 c 3 c 4 n 5 h
dE/dx -0.1150239D-01 0.2259669D-01 -0.3153006D-02 -0.2718508D-01 -0.1064344D-01
dE/dy 0.4798462D-02 0.5902019D-01 -0.4060517D-01 0.3404657D-01 -0.1054522D-01
dE/dz 0.6015413D-05 0.3707704D-02 -0.2306249D-02 0.1334956D-02 -0.8032586D-03
ATOM 6 c 7 s 8 n 9 h 10 c
dE/dx 0.3017851D-01 -0.2253417D-01 -0.3195785D-01 -0.4441489D-02 0.8337613D-02
dE/dy -0.2805275D-01 0.1196856D-01 0.1888257D-01 0.1382483D-01 -0.8171767D-01
dE/dz -0.9413310D-03 0.2069422D-03 0.3914382D-03 0.6724659D-03 -0.4316928D-02
ATOM 11 s 12 h 13 h 14 h 15 h
dE/dx 0.3138990D-01 0.8416159D-02 0.1128067D-02 0.8941240D-02 0.4292434D-03
dE/dy 0.3893252D-01 0.3335059D-02 -0.1457401D-01 0.4869915D-02 -0.1418384D-01
dE/dz 0.2767787D-02 0.1357569D-01 -0.7834375D-03 -0.1273530D-01 -0.7764904D-03
resulting FORCE (fx,fy,fz) = (-.874D-10,0.110D-08,0.562D-10)
resulting MOMENT (mx,my,mz) = (-.504D-04,-.543D-04,-.538D-03)
The expected output is :
{'1c_ddz': '-0.3845687D-01', '1c_ddy': '0.2170984D-02', and etc)
the code I have so far looks like below:
class NACParser(ParseSection):
name = "coupling"
nac_coupling = SimpleLineParser(r" cartesian\s nonadiabatic\s coupling\s matrix\s elements\s \((\d )/(\w )\)", ["value", "unit"], types=[int,str])
atom_index = SimpleLineParser(r"<\s*(\d )\s*\|\s*(\w /\w )\s*\|\s*(\d )\s*>", ["num1","atom?","num2"], types=[int,str, int])
ddx = SimpleLineParser(r"d/dx\s (\S )\s (\S )\s (\S )\s (\S )\s (\S )",["1c_ddx","2c_ddx","3c_ddx","4n_ddx","5h_ddx"], types=[str]*5)
ddy = SimpleLineParser(r"d/dy\s (\S )\s (\S )\s (\S )\s (\S )\s (\S )",["1c_ddy","2c_ddy","3c_ddy","4n_ddy","5h_ddy"], types=[str]*5)
ddz = SimpleLineParser(r"d/dz\s (\S )\s (\S )\s (\S )\s (\S )\s (\S )",["1c_ddz","2c_ddz","3c_ddz","4n_ddz","5h_ddz"], types=[str]*5)
parsers=[nac_coupling, atom_index, ddx, ddy, ddz]
def __init__(self):
ParseSection.__init__(self, r" cartesian\s nonadiabatic\s coupling\s matrix\s elements\s \((\d )/(\w )\)",r"maximum component of gradient",multi=True)
and this outputs as follows:
('nac coupling: ', {'1c_ddz': '-0.3845687D-01', '1c_ddy': '0.2170984D-02', 'atom?': 'd/dR', 'num1': 0, 'num2': 1, '5h_ddx': '0.3118277D-03', '5h_ddy': '0.8573042D-03', '5h_ddz': '-0.1580846D-01', '1c_ddx': '0.7336802D-03', 'unit': 'bohr', '2c_ddz': '0.3120165D-02', '2c_ddx': '-0.1305555D-02', '2c_ddy': '0.8126333D-02', 'value': 1, '4n_ddy': '-0.8441980D-02', '4n_ddx': '0.1166107D-02', '4n_ddz': '0.2287865D-02', '3c_ddy': '-0.9954913D-04', '3c_ddx': '-0.2407839D-04', '3c_ddz': '0.1907032D-02'})
which is good but there are some issues:
It only prints from the very last line and this, I think, is because it overwrites due to similar other lines.
This code will only work for this specific molecule and I want something that can work for any molecule. What I mean is : in this example - I have a molecule with 15 atoms and the first atom is c (carbon) , 5th atom is h (hydrogen) and 11th atom is s (sulfur) but the total number of atoms (which is currently 15 ) and the name of atoms can be different when I have different molecule.
So I am wondering how can I write a general code that can work for a general molecule . Any help?
CodePudding user response:
This will to literally what you asked. Maybe you can use this as a basis. I just gather all the atom IDs when I find a line with "ATOM", and create the dict entries when I find a line with "d/d". I would show the output, but I just typed in faked data because I didn't want to retype all of that.
import re
from pprint import pprint
header = r"(\d [a-z]{1,2})"
atoms = []
gather = {}
for line in open('x.txt'):
if len(line) < 5:
continue
if 'ATOM' in line:
atoms = re.findall( header, line )
atoms = [s.replace(' ','') for s in atoms]
continue
if '/d' in line:
parts = line.split()
row = parts[0].replace('/','')
for at,val in zip(atoms,parts[1:]):
gather[at '_' row] = val
pprint(gather)
Here's the output from your test data. I hope you realize that the cut-and-paste data doesn't match the image. The image uses d/dx
, but the cut and paste uses dE/dx
. I have assumed you want the "E" in the dict tag too, but that's easy to fix if you don't.
{'10c_dEdx': '0.8337613D-02',
'10c_dEdy': '-0.8171767D-01',
'10c_dEdz': '-0.4316928D-02',
'11s_dEdx': '0.3138990D-01',
'11s_dEdy': '0.3893252D-01',
'11s_dEdz': '0.2767787D-02',
'12h_dEdx': '0.8416159D-02',
'12h_dEdy': '0.3335059D-02',
'12h_dEdz': '0.1357569D-01',
'13h_dEdx': '0.1128067D-02',
'13h_dEdy': '-0.1457401D-01',
'13h_dEdz': '-0.7834375D-03',
'14h_dEdx': '0.8941240D-02',
'14h_dEdy': '0.4869915D-02',
'14h_dEdz': '-0.1273530D-01',
'15h_dEdx': '0.4292434D-03',
'15h_dEdy': '-0.1418384D-01',
'15h_dEdz': '-0.7764904D-03',
'1c_dEdx': '-0.1150239D-01',
'1c_dEdy': '0.4798462D-02',
'1c_dEdz': '0.6015413D-05',
'2c_dEdx': '0.2259669D-01',
'2c_dEdy': '0.5902019D-01',
'2c_dEdz': '0.3707704D-02',
'3c_dEdx': '-0.3153006D-02',
'3c_dEdy': '-0.4060517D-01',
'3c_dEdz': '-0.2306249D-02',
'4n_dEdx': '-0.2718508D-01',
'4n_dEdy': '0.3404657D-01',
'4n_dEdz': '0.1334956D-02',
'5h_dEdx': '-0.1064344D-01',
'5h_dEdy': '-0.1054522D-01',
'5h_dEdz': '-0.8032586D-03',
'6c_dEdx': '0.3017851D-01',
'6c_dEdy': '-0.2805275D-01',
'6c_dEdz': '-0.9413310D-03',
'7s_dEdx': '-0.2253417D-01',
'7s_dEdy': '0.1196856D-01',
'7s_dEdz': '0.2069422D-03',
'8n_dEdx': '-0.3195785D-01',
'8n_dEdy': '0.1888257D-01',
'8n_dEdz': '0.3914382D-03',
'9h_dEdx': '-0.4441489D-02',
'9h_dEdy': '0.1382483D-01',
'9h_dEdz': '0.6724659D-03'}