I have a large DataFrame of coordinates, and I have to write its content into an input file for another program. For now I have a nested, quite complex for-loop and many .write() statements - it takes a lot of time to write the file. I can't find nor figure out any smarter way to do it.
DataFrame
type | Uiso | x | y | z |
---|---|---|---|---|
H | 0.0320 | 0.257510 | 0.254363 | 0.021930 |
H | 0.0330 | 0.255228 | 0.163941 | 0.038431 |
H | 0.0330 | 0.255228 | 0.163941 | 0.038431 |
C | 0.0278 | 0.122879 | 0.207314 | 0.027545 |
H | 0.0320 | 0.534974 | 0.254363 | 0.021930 |
Input
- First line: atom type.
- Second line: number of these atom types in structure, atom number (H: 1, C: 6, etc.), then occupancy (always 1), then Uiso.
- Rest of the lines: x y z coordinates.
Repeat for each atom type in DataFrame.
H
4 1 1.000 0.0320
0.257510 0.254363 0.021930
0.255228 0.163941 0.038431
0.255228 0.163941 0.038431
0.534974 0.254363 0.021930
C
1 6 1.000 0.0278
0.122879 0.207314 0.027545
Code:
f = open(file_name ".inp", "w")
for type in structure["type"].unique():
for Uiso in structure[structure["type"] == type]["Uiso"].unique():
f.write(" " type "\n")
f.write(" " str(len(list(structure[(structure["type"] == type) & (structure["Uiso"] == Uiso)][["x", "y", "z"]].iterrows()))) " ")
f.write(str(atomic_numbers[type]))
f.write(str(" 1.000 "))
f.write(str(Uiso) "\n")
for coord in structure[(structure["type"] == type) & (structure["Uiso"] == Uiso)].itertuples():
f.write(" "
str(coord.x) " "
str(coord.y) " "
str(coord.z) "\n")
f.close()
Question
Is there a way to reduce the number of loops? Maybe write whole sorted portions of DataFrame at once?
CodePudding user response:
There are lots of ways to approach this, e.g. making a template string then filling it with format()
, but your case might be compact enough to do it on the fly, so to speak.
For example, this does more or less what you want for a DataFrame df
:
import numpy as np
with open('data.inp', 'wt') as f:
for atom, group in df.groupby('type'):
# Write the header info using an f-string.
f.write(f"{atom}\n {len(group)} 1 1.000 {group['Uiso'].iloc[0]:.4f}\n")
# Write the data table using savetxt for fixed-width columns.
data = group.loc[:, ['x', 'y', 'z']]
np.savetxt(f, data, fmt='.6f', newline='\n')
This produces:
C
1 1 1.000 0.0278
0.122879 0.207314 0.027545
H
4 1 1.000 0.0320
0.257510 0.254363 0.021930
0.255228 0.163941 0.038431
0.255228 0.163941 0.038431
0.534974 0.254363 0.021930
Some notes:
- Always use a context for file reading and writing (the
with
block). It's the safest thing to do. - I tried using
pd.to_csv()
but I don't think it can do fixed width columns, whereasnp.savetxt()
can. - I sort of guessed/made up the info on the second line of each record; you might need to change some of that.
- You could change the sort order by throwing
sorted()
around thegroupby
using some appropriate key (e.g. a table to look up atomic number or something).