Many text files with .txt extensions are present in a directory (1620_10.asc_rsmp_1.0.txt, 132_10.asc_rsmp_1.0.txt
, etc) and the first few digits of the file names are the only changes (for example 1620
in first file and 132
in second file). I want to perform some operations on the text
files.
The first line of every text file is a string while the rest are floating point numbers.
step1:
The first thing I want to do is delete the first line from all the existing text files
step2:
I want to convert rows to columns in all text files.
step:3
Following that, I want to arrange the files produced in step2 based on the names of the files(132
_10.asc_rsmp_1.0.txt 1620
_10.asc_rsmp_1.0.txt ...) side by side and want to save in a separate file.
cat 1620_10.asc_rsmp_1.0.txt
TIMESERIES ____, 5605 xxxxxxx, 1 yyy, 1969-11-31T22:52:10.000000, ZZZZZ, FLOAT,
0.0000000000e 00 5.8895751219e-02 1.9720949872e-02 4.7712552071e-02 1.6255806150e-02 5.0983512543e-02
2.4151940813e-02 4.3959767187e-02 1.9066090517e-02 4.8980189213e-02 2.6237709462e-02 4.1379166269e-02
cat 132_10.asc_rsmp_1.0.txt
TIMESERIES ____, 5605 xxxxxxx, 1 yyy, 1980-12-31T23:58:20.000000, ZZZZZ, FLOAT,
2.0337053383e-02 4.7575540537e-02 2.7508078190e-02 3.9923797852e-02 2.1663353231e-02 4.6368790709e-02
2.8194768989e-02 3.8577115641e-02 2.1935380223e-02 4.6024962357e-02 2.9320681307e-02 3.7630711188e-02
Expected output: cat output.txt
2.0337053383e-02 0.0000000000e 00
4.7575540537e-02 5.8895751219e-02
2.7508078190e-02 1.9720949872e-02
3.9923797852e-02 4.7712552071e-02
2.1663353231e-02 1.6255806150e-02
4.6368790709e-02 5.0983512543e-02
2.8194768989e-02 2.4151940813e-02
3.8577115641e-02 4.3959767187e-02
2.1935380223e-02 1.9066090517e-02
4.6024962357e-02 4.8980189213e-02
2.9320681307e-02 2.6237709462e-02
3.7630711188e-02 4.1379166269e-02
My trial code:
with open("*.txt",'r') as f:
with open("new_file.txt",'w') as f1:
f.next() # skip header line
for line in f:
f1.write(line)
However it doesnot produce any expected output.Hope Helps from experts.Thanks.
CodePudding user response:
It's unclear exactly what you want. This does what I think you want:
from glob import glob
# Returns a list of all relevant filenames
filenames = glob("*_10.asc_rsmp_1.0.txt")
# All the values will be stored in a dict where the key is the filename, and
# the value is a list of values
# It will be used later on to arrange the values side by side
values_by_filename = {}
# Read each filename
for filename in filenames:
with open(filename) as f:
with open(filename "_processed.txt", "w") as f_new:
# Skip the first line (header)
next(f)
# Add all the values on every line to a single list
values = []
for line in f:
values.extend(line.split())
# Write each value on a new line in a new file
f_new.write("\n".join(values))
# Store the original filename and values to a dict for later
values_by_filename[filename] = values
# Order the filenames by the number before the first underscore
ordered_filenames = sorted(values_by_filename,
key=lambda filename: int(filename.split("_")[0]))
# Arrange the values side by side in a new file
# zip iterates over every list of values at once, yielding the next value
# from every list as a tuple each iteration
lines = []
for values in zip(*(values_by_filename[filename] for filename in ordered_filenames)):
# Separate each column by 3 spaces, as per your expected output
lines.append(" ".join(values))
# Write the concatenated values to file with a newline between each row, but
# not at the end of the file
with open("output.txt", "w") as f:
f.write("\n".join(lines))
output.txt
:
2.0337053383e-02 0.0000000000e 00
4.7575540537e-02 5.8895751219e-02
2.7508078190e-02 1.9720949872e-02
3.9923797852e-02 4.7712552071e-02
2.1663353231e-02 1.6255806150e-02
4.6368790709e-02 5.0983512543e-02
2.8194768989e-02 2.4151940813e-02
3.8577115641e-02 4.3959767187e-02
2.1935380223e-02 1.9066090517e-02
4.6024962357e-02 4.8980189213e-02
2.9320681307e-02 2.6237709462e-02
3.7630711188e-02 4.1379166269e-02
Be sure to read the documentation, in particular: