Home > Blockchain >  combine multiple text files with different number of columns separated by tab, into one merged text
combine multiple text files with different number of columns separated by tab, into one merged text

Time:09-20

How to combine multiple txt files into one merged file, where each file contains different number of columns(with Float values ​​usually) and I need to get one merged file with all the columns as follows:

EDIT: there is one rule: In case there is a non-numeric value ("Nan" for example..), I need to do padding according to the last numeric value that was before it.

file1.txt

1.04
2.26
3.87

file2.txt

5.44    4.65    9.86
8.67    Nan     7.45
8.41    6.54    6.21

file3.txt

6.98    6.52
4.45    8.74
0.58    4.12

merged.txt

1.04    5.44    4.65    9.86    6.98    6.52
2.26    8.67    8.67    7.45    4.45    8.74
3.87    8.41    6.54    6.21    0.58    4.12

I saw here answer to the case of one column in each file.

how can I do this for multiple columns?

CodePudding user response:

The simplest way is probably using numpy:

import numpy as np

filenames = ["file1.txt", "file2.txt", "file3.txt"]
fmt = '%.2f'    # assuming format is known in advance

all_columns = []
for filename in filenames:
    all_columns.append(np.genfromtxt(filename))

arr_out = np.column_stack(tuple(all_columns))  # Stack columns

# Fill NaN-elements with last numeric value
arr_1d = np.ravel(arr_out)  # "flat reference" to arr_out
replaced_all_nan = False
nan_indices = np.where(np.isnan(arr_1d))
while len(nan_indices[0]):
    new_indices = tuple([i-1 for i in nan_indices])
    arr_1d[nan_indices] = arr_1d[new_indices]
    nan_indices = np.where(np.isnan(arr_1d))

np.savetxt("merged.txt", arr_out, fmt=fmt)

One problem (if it is one for you) that might occur is that the very first, i.e. the upper-left element, is non-numeric. In that case, the last (lower-right) value or the last numeric value before that would be used.

  • Related