Home > Software engineering >  Import a txt file to an 2D array and remove special characters in each line
Import a txt file to an 2D array and remove special characters in each line

Time:12-11

I've tried many options to import a text file with this format:

headerline

headerline

headerline

[DD/MM/YY - HH:MM:SS:MS] 00975 026.98 -0000.6 -0000.5 N2

[DD/MM/YY - HH:MM:SS:MS] 00975 026.98 -0000.6 -0000.5 N2

[DD/MM/YY - HH:MM:SS:MS] 00974 026.98 -0000.6 -0000.5 N2 ...

The target is to create a 2D array that contains each value from a line as an individual element, with the lines arranged as rows. So far, I only managed to get one entire line as an element in the array using numpy.genfromtxt:

data = numpy.genfromtxt("test.txt", skip_header=3, delimiter=" ")

Any help is much appreciated!

CodePudding user response:

1- For a simple and Pythonic way you can do the following:

    # Start reading the entire file using readlines() method.
    with open('test.txt') as my_file:
        my_array = my_file.readlines()
 
    # After that skip empty lines iterating over my_rray:
    my_array = [l.split(" ") for l in my_array if l != "\n"]

    # Finally create a numpy array from your python list:
    data = np.asarray(my_array)

2- On the other hand, you can use genfromtxt method for a more elegant solution and solve it in one line:

    data = numpy.genfromtxt('test.txt', dtype=str, delimiter=" ")

Note that:

  • I'm telling numpy to treat values as strings
  • Numpy automatically filter empty rows in your file

More info about genfromtxt here

Result for your first two lines should be looks similar to the following:

array([['[DD/MM/YY', '-', 'HH:MM:SS:MS]', ' 00975', ' 026.98', '-0000.6',
        '-0000.5', 'N2\n'],
       ['[DD/MM/YY', '-', 'HH:MM:SS:MS]', ' 00975', ' 026.98', '-0000.6',
        '-0000.5', 'N2\n']], dtype='<U12')

I hope I have understood well what you want to do and that my answer would be useful!

  • Related