I am trying to write a simple parser of configuration files, for example, there might be a file called INCAR
:
NSW = 1000
POTIM = 1
TEBEG = 300
If I'd like to extract the value of POTIM
, I could adopt awk to extrac the text between blanks and use the following script:
#!/bin/bash
vaspT ()
{
if [ -f INCAR ]; then
local potim=$(grep POTIM INCAR | awk '{print $3}');
else
local potim=1;
fi;
echo "# Time step: ${potim}" > .vasp_md.dat;
echo "# Step Temperature Total_energy E_pot E_kin" >> .vasp_md.dat;
}
vaspT
but if someone doesn't follow the rule of alignment, and use the configure file like:
NSW = 1000
POTIM=1
TEBEG=300
then, I have to use another delimiter.
My question is:
Is there a simple solution or an existing library (Python or Bash is acceptable)for this kind of job?
CodePudding user response:
You can use regex in this case:
import re
myString = """
NSW = 1000
POTIM = 1
TEBEG=300
"""
re.findall("POTIM(\s )?\=(\s )?(\d )", myString)
Output
[(' ', ' ', '1')]
If you use regex with this pattern, no matter how many spaces there are, always the last element of the tuple (if any) is the variable you want.
Another Example
import re
myString = """
NSW = 1000
POTIM=1
TEBEG=300
"""
re.findall("POTIM(\s )?\=(\s )?(\d )", myString)
Output
[('', '', '1')]
CodePudding user response:
I would use cut
for this use case...
grep POTIM INCAR | cut -d "=" -f 2 | sed s/\ //g
cut -d "=" -f 2
will take the second field in respect to the=
delimiter.sed s/\ //g
will remove spaces around the value
CodePudding user response:
You could use awk and set the field separator to =
between optional spaces.
If the first field is POTIM, then print the second field.
awk -F"[[:space:]]*=[[:space:]]*" '
$1=="POTIM" {print $2}
' file
Output
1
CodePudding user response:
You can create a dictionary containing the variables and their value:
import re
with open("filename", "r") as f:
config = dict(re.findall(r"(\w )\s*=\s*(\w )", f.read()))
print(config)
Output:
{'NSW': '1000', 'POTIM': '1', 'TEBEG': '300'}
You can then retrieve the value of each variable easily:
print(config["POTIM"]) # 1
(\w )\s*=\s*(\w )
(\w )
: First capturing group, matches any word character between 1 and unlimited times.\s*
: Matches any whitespace between 0 and unlimited times.=
: Matches=
.\s*
: Matches any whitespace between 0 and unlimited times.(\w )
: Second capturing group, matches any word character between 1 and unlimited times.
For each match, re.findall
will create a tuple containing the capturing groups. Using dict()
will then convert the list to a dictionary.
CodePudding user response:
Using sed
#!/bin/bash
vaspT ()
{
if [ -f INCAR ]; then
local potim
potim=$(sed -n '/POTIM/s/.*=[[:space:]]\?\(.*\)/\1/p' INCAR)
else
local potim
potim=1
fi
echo "# Time step: ${potim}" > .vasp_md.dat
echo "# Step Temperature Total_energy E_pot E_kin" >> .vasp_md.dat
}
vaspT
CodePudding user response:
Is there a simple solution or an existing library(...)Python(...)for this kind of job?
There is configparser
in python standard library, but it does assume that there is always header, so you would need to add one if your file has not, consider following example, let file.txt
content be
ZERO=0
LEFT =1
RIGHT= 1
BOTH = 2
MULTI = 3
then it might be used as follows
import configparser
config = configparser.ConfigParser()
with open("file.txt","r") as f:
config.read_string('[default]\n' f.read())
print(config['default']['ZERO']) # 0
print(config['default']['LEFT']) # 1
print(config['default']['RIGHT']) # 1
print(config['default']['BOTH']) # 2
print(config['default']['MULTI']) # 3
Explanation: I add line with default to allow configparser
to work. Note that this workaround and you might elect to coerce users into using headers instead of employing this workaround, in which case usage become easier:
import configparser
config = configparser.ConfigParser()
config.read("file.txt")
...
CodePudding user response:
You're doing too much in shell. Awk is the tool that the guys who invented shell also invented for shell to call to manipulate text so just use awk for the whole text manipulation instead of unnecessarily adding other shell commands to feed awk one line at a time, etc.
Your question doesn't tell us what to do if the file exists but doesn't contain a POTIM= line or contains multiple POTIM= lines or how to handle comments in your file (or what those would look like) so ignoring the possibility of comments and guessing that if POTIM= doesn't exist you want to print 1 while if it does exist you want to print the last value seen:
$ cat tst.sh
#!/usr/bin/env bash
vaspT() {
local infile='INCAR'
[[ -f "$infile" ]] || infile='/dev/null'
awk '
{
gsub(/^[[:space:]] |[[:space:]] $/,"")
tag = val = $0
sub(/[[:space:]]*=.*/,"",tag)
sub(/[^=]*=[[:space:]]*/,"",val)
tag2val[tag] = val
}
END {
print "# Time step:", ("POTIM" in tag2val ? tag2val["POTIM"] : 1)
print "# Step Temperature Total_energy E_pot E_kin"
}
' "$infile" > .vasp_md.dat
}
vaspT
$ ./tst.sh
$ cat .vasp_md.dat
# Time step: 1
# Step Temperature Total_energy E_pot E_kin
I use this:
{
gsub(/^[[:space:]] |[[:space:]] $/,"")
tag = val = $0
sub(/[[:space:]]*=.*/,"",tag)
sub(/[^=]*=[[:space:]]*/,"",val)
tag2val[tag] = val
}
instead of just:
BEGIN { FS = "[[:space:]]*=[[:space:]]*" }
{ tag2val[$1] = $2 }
so the code would continue to work if there are leading or trailing spaces on the line or the value contained an =
, e.g.:
NSW = 1000
POTIM = "foo=bar"
TEBEG = 300