How to combine multiple txt files into one merged file, where each file contains different number of columns(with Float values usually) and I need to get one merged file with all the columns as follows:
EDIT: there is one rule: In case there is a non-numeric value ("Nan" for example..), I need to do padding according to the last numeric value that was before it.
file1.txt
1.04
2.26
3.87
file2.txt
5.44 4.65 9.86
8.67 Nan 7.45
8.41 6.54 6.21
file3.txt
6.98 6.52
4.45 8.74
0.58 4.12
merged.txt
1.04 5.44 4.65 9.86 6.98 6.52
2.26 8.67 8.67 7.45 4.45 8.74
3.87 8.41 6.54 6.21 0.58 4.12
I saw here answer to the case of one column in each file.
how can I do this for multiple columns?
CodePudding user response:
The simplest way is probably using numpy:
import numpy as np
filenames = ["file1.txt", "file2.txt", "file3.txt"]
fmt = '%.2f' # assuming format is known in advance
all_columns = []
for filename in filenames:
all_columns.append(np.genfromtxt(filename))
arr_out = np.column_stack(tuple(all_columns)) # Stack columns
# Fill NaN-elements with last numeric value
arr_1d = np.ravel(arr_out) # "flat reference" to arr_out
replaced_all_nan = False
nan_indices = np.where(np.isnan(arr_1d))
while len(nan_indices[0]):
new_indices = tuple([i-1 for i in nan_indices])
arr_1d[nan_indices] = arr_1d[new_indices]
nan_indices = np.where(np.isnan(arr_1d))
np.savetxt("merged.txt", arr_out, fmt=fmt)
One problem (if it is one for you) that might occur is that the very first, i.e. the upper-left element, is non-numeric. In that case, the last (lower-right) value or the last numeric value before that would be used.