Home > Software engineering >  arranging files based on the names
arranging files based on the names

Time:07-04

Many text files with .txt extensions are present in a directory (1620_10.asc_rsmp_1.0.txt, 132_10.asc_rsmp_1.0.txt, etc) and the first few digits of the file names are the only changes (for example 1620 in first file and 132 in second file). I want to perform some operations on the text files.

The first line of every text file is a string while the rest are floating point numbers.

step1: The first thing I want to do is delete the first line from all the existing text files

step2: I want to convert rows to columns in all text files.

step:3 Following that, I want to arrange the files produced in step2 based on the names of the files(132_10.asc_rsmp_1.0.txt 1620_10.asc_rsmp_1.0.txt ...) side by side and want to save in a separate file.

cat 1620_10.asc_rsmp_1.0.txt
TIMESERIES ____, 5605 xxxxxxx, 1 yyy, 1969-11-31T22:52:10.000000, ZZZZZ, FLOAT,
 0.0000000000e 00     5.8895751219e-02     1.9720949872e-02     4.7712552071e-02     1.6255806150e-02     5.0983512543e-02
 2.4151940813e-02     4.3959767187e-02     1.9066090517e-02     4.8980189213e-02     2.6237709462e-02     4.1379166269e-02

cat 132_10.asc_rsmp_1.0.txt
TIMESERIES ____, 5605 xxxxxxx, 1 yyy, 1980-12-31T23:58:20.000000, ZZZZZ, FLOAT,
 2.0337053383e-02     4.7575540537e-02     2.7508078190e-02     3.9923797852e-02     2.1663353231e-02     4.6368790709e-02
 2.8194768989e-02     3.8577115641e-02     2.1935380223e-02     4.6024962357e-02     2.9320681307e-02     3.7630711188e-02

Expected output: cat output.txt

 2.0337053383e-02    0.0000000000e 00
 4.7575540537e-02    5.8895751219e-02
 2.7508078190e-02    1.9720949872e-02
 3.9923797852e-02    4.7712552071e-02
 2.1663353231e-02    1.6255806150e-02
 4.6368790709e-02    5.0983512543e-02
 2.8194768989e-02    2.4151940813e-02
 3.8577115641e-02    4.3959767187e-02
 2.1935380223e-02    1.9066090517e-02
 4.6024962357e-02    4.8980189213e-02
 2.9320681307e-02    2.6237709462e-02
 3.7630711188e-02    4.1379166269e-02

My trial code:

with open("*.txt",'r') as f:
    with open("new_file.txt",'w') as f1:
        f.next() # skip header line
        for line in f:
            f1.write(line)

However it doesnot produce any expected output.Hope Helps from experts.Thanks.

CodePudding user response:

It's unclear exactly what you want. This does what I think you want:

from glob import glob

# Returns a list of all relevant filenames
filenames = glob("*_10.asc_rsmp_1.0.txt")

# All the values will be stored in a dict where the key is the filename, and
# the value is a list of values
# It will be used later on to arrange the values side by side
values_by_filename = {}

# Read each filename
for filename in filenames:
    with open(filename) as f:
        with open(filename   "_processed.txt", "w") as f_new:
            
            # Skip the first line (header)
            next(f)
            
            # Add all the values on every line to a single list
            values = []
            for line in f:
                values.extend(line.split())
            
            # Write each value on a new line in a new file
            f_new.write("\n".join(values))
            
            # Store the original filename and values to a dict for later
            values_by_filename[filename] = values

# Order the filenames by the number before the first underscore
ordered_filenames = sorted(values_by_filename, 
                           key=lambda filename: int(filename.split("_")[0]))

# Arrange the values side by side in a new file
# zip iterates over every list of values at once, yielding the next value
# from every list as a tuple each iteration
lines = []
for values in zip(*(values_by_filename[filename] for filename in ordered_filenames)):
    
    # Separate each column by 3 spaces, as per your expected output
    lines.append("   ".join(values))

# Write the concatenated values to file with a newline between each row, but
# not at the end of the file
with open("output.txt", "w") as f:
    f.write("\n".join(lines))

output.txt:

 2.0337053383e-02    0.0000000000e 00
 4.7575540537e-02    5.8895751219e-02
 2.7508078190e-02    1.9720949872e-02
 3.9923797852e-02    4.7712552071e-02
 2.1663353231e-02    1.6255806150e-02
 4.6368790709e-02    5.0983512543e-02
 2.8194768989e-02    2.4151940813e-02
 3.8577115641e-02    4.3959767187e-02
 2.1935380223e-02    1.9066090517e-02
 4.6024962357e-02    4.8980189213e-02
 2.9320681307e-02    2.6237709462e-02
 3.7630711188e-02    4.1379166269e-02

Be sure to read the documentation, in particular:

  • Related