Import a txt file to an 2D array and remove special characters in each line-CodePudding

I've tried many options to import a text file with this format:

headerline

[DD/MM/YY - HH:MM:SS:MS] 00975 026.98 -0000.6 -0000.5 N2

[DD/MM/YY - HH:MM:SS:MS] 00974 026.98 -0000.6 -0000.5 N2 ...

The target is to create a 2D array that contains each value from a line as an individual element, with the lines arranged as rows. So far, I only managed to get one entire line as an element in the array using numpy.genfromtxt:

data = numpy.genfromtxt("test.txt", skip_header=3, delimiter=" ")

Any help is much appreciated!

CodePudding user response：

1- For a simple and Pythonic way you can do the following:

    # Start reading the entire file using readlines() method.
    with open('test.txt') as my_file:
        my_array = my_file.readlines()
 
    # After that skip empty lines iterating over my_rray:
    my_array = [l.split(" ") for l in my_array if l != "\n"]

    # Finally create a numpy array from your python list:
    data = np.asarray(my_array)

2- On the other hand, you can use genfromtxt method for a more elegant solution and solve it in one line:

    data = numpy.genfromtxt('test.txt', dtype=str, delimiter=" ")

Note that:

I'm telling numpy to treat values as strings
Numpy automatically filter empty rows in your file

More info about genfromtxt here

Result for your first two lines should be looks similar to the following:

array([['[DD/MM/YY', '-', 'HH:MM:SS:MS]', ' 00975', ' 026.98', '-0000.6',
        '-0000.5', 'N2\n'],
       ['[DD/MM/YY', '-', 'HH:MM:SS:MS]', ' 00975', ' 026.98', '-0000.6',
        '-0000.5', 'N2\n']], dtype='<U12')

I hope I have understood well what you want to do and that my answer would be useful!