How can I remove the first n lines from all .txt files in a folder?-CodePudding

I am completely new to Python as I have one specific task that I want to complete. I have a large dataset of .XY files (essentially .txt files), each of which has a header of 23 lines. I wish to use Python (Python 3.7, through visual studio code) to remove the header from all files (either delete from the original files, or write to new files) with the same format as the original file. An example of the top of a file I wish to edit is shown below:

# Distance Sample to Detector: 0.3004918066158592 m
# PONI: 1.261e-01, 1.147e-01 m
# Rotations: 0.000061 0.000011 -0.000000 rad
# 
# == Fit2d calibration ==
# Distance Sample-beamCenter: 300.492 mm
# Center: x=1529.147, y=1680.772 pix
# Tilt: 0.004 deg  TiltPlanRot: 169.652 deg
# 
# Detector Detector  Spline= None    PixelSize= 7.500e-05, 7.500e-05 m
#    Detector has a mask: False 
#    Detector has a dark current: False 
#    detector has a flat field: False 
# 
# Wavelength: 4.1069000000000004e-11 m
# Mask applied: None
# Dark current applied: None
# Flat field applied: None
# Polarization factor: None
# Normalization factor: None
#
# 2th_deg    I
1.441032378E 00  -3.563451171E-01
1.447230367E 00  1.410741210E-01
1.453428356E 00  6.531007886E-01
1.459626345E 00  1.176007986E 00
1.465824333E 00  1.784591913E 00

CodePudding user response：

Open the file, read it in, and then only use the lines you need.

Using with

Using with will open the file and then it will automatically close the file object after the block completes.

    with open('filename.txt', 'r') as input_file:
        lines = input_file.readlines()
        input_you_need = lines[23:]
        #do something with input_you_need

Using open and close

Using open will open the file for the the remainder of the script, or until you close it. ALWAYS CLOSE YOUR FILES

# Using readlines()
file1 = open('myfile.txt', 'r')
Lines = file1.readlines()
lines_needed = Lines[23:]
file1.close()

# writing to file
file1 = open('myfile.txt', 'w')
file1.writelines(lines_needed)
file1.close()

CodePudding user response：

Your program will need to go through the following steps:

Iterate through all the files within your dataset
Read in each file's contents and
Write only the required lines into a new file.

For the first step the os module provides a handy walk() function it takes in a root path and returns a list of all subdirectories under the given path as a tuple. The first element of this tuple is the path to that subdirectory, the second is a list of all folders and the third is a list of all files within that subdirectory.

When iterating all subdirectories a nested loop through all file names in the third tuple element allows you to iterate all files within each subdirectory. Once you go through all files you can simply read the files contents using python's with keyword. (The second parameter to the open function tells it that you want to read)

with open("path/to/file", "r") as f:

    lines = f.readlines()

This allows you to read all lines as a list of strings into the variable lines.

Similarly you can write to a file pretty much the same way but this time you need to specify "w" as the second parameter since you want to have writing access to the new file.

with open("path/to/other_file", "w") as f:

    f.writelines(["line1", "line2"])

This code writes a given list of lines into the file. Assuming you have already read all lines from the existing file into lines you can simply take the lines you need using list slicing: lines[22:] returns all lines starting from the 22nd element of the list lines.

Therefore you can write f.writelines(lines[22:]) into the new file.

Something similar to this should work for you:

import os

#iterate all files from current directory
#you can overwrite the path (".") to suit your needs
for path, folders, files in os.walk("."): 

    for file in files:

        name, extention = os.path.splitext(file)

        # make sure only .XY files are affected
        if not extention == ".XY":
            continue

        lines = None

        # read lines from existing .XY file
        with open(os.path.join(path, file), "r") as file:

            lines = file.readlines()

        # write all but the 22 first lines into a new file
        with open(os.path.join(path, name)   "_cut"   extention, "w") as newFile:

            newFile.writelines(lines[22:])