I have a huge amount of .txt files that consist of one-column numeric values, such as:
-0.42424
0.5466
0.9
-0.4577
0
1.32
-0.933
...
Using the code
import numpy as np
My_data = np.loadtxt("/pathtodata")
loads My_data
into Python. Is there any possibilty to tell np.loadtxt
that it should not load zero values (0) or at least replace them by another value of choice? Of course, one could remove or replace zeros in all txt files by hand. But the number of txt files and the list of values they contain is massive. Therefore, I am looking for an option to do this in Python, possibly without changing the actual data files.
I don't want to remove values/rows that start with 0, but rows that only contain 0
, such as row 5 in my example above.
CodePudding user response:
I would do it following way, let file.txt
content be
-0.42424
0.5466
0.9
-0.4577
0
1.32
-0.933
then
import numpy as np
def getnonzeros(filename):
with open(filename,"rb") as f:
for line in f:
if line.strip() == b"0":
continue
yield line
arr = np.loadtxt(getnonzeros("file.txt"))
print(arr)
output
[-0.42424 0.5466 0.9 -0.4577 1.32 -0.933 ]
Explanation: np.loadtxt
can accept bytes
yield
ing generator so I craft suitable one. It does iterate over lines (so there is no loading of whole file into memory) of file open in r
ead-b
inary mode, skipping lines which are equal to b"0"
after jettisoning leading and trailing whitespaces. Disclaimer: this code assume zeros in your file are always rendered as 0
not for example 0.0000
.
CodePudding user response:
This seems like an elegant solution to me. However, it removes the 0s after the data has been imported, not while it's being imported. (Not sure if that matters.)
import numpy as np
my_data = np.loadtxt("pathtodata")
my_data = my_data[~(my_data==0)]