I have many ascii
files in a directory, i just want to sort the file name numerically and want to paste side by side.
Secondly, after pasting i want to make all the column of same length by appending zero at the end.
My files are named as
data_Z_1 data_N_457 data_E_45
1.5 1.2 2.3
2.0 2.3 1.8
4.5
At first I just want sort the above file names numerically as given below and then want to paste side by side as
data_Z_1 data_E_45 data_N_457
1.5 2.3 1.2
2.0 1.8 2.3
4.5
Secondly i need to make all the columns equal length in a pasted file, so that output should be like
1.5 2.3 1.2
2.0 1.8 2.3
0.0 0.0 4.5
I tried as below:
ls data_*_* | sort -V
But it doesnot work.Can anybody help me overcoming this problem.Thanks in advance.
CodePudding user response:
Would you please try the following:
paste $(ls data* | sort -t_ -k3n) | awk -F'\t' -v OFS='\t' '
{for (i=1; i<=NF; i ) if ($i == "") $i = "0.0"} 1'
Output:
1.5 2.3 1.2
2.0 1.8 2.3
0.0 0.0 4.5
sort -t_ -k3n
sets the field separator to_
and numerically sorts the filenames on the 3rd field values.- The options
-F'\t' -v OFS='\t'
to the awk command assign input/output field separator to a tab character. - The awk statement
for (i=1; i<=NF; i ) if ($i == "") $i = "0.0"
scans the input fields and sets0.0
for the empty fields. - The final
1
is equivalent toprint $0
to print the fields.
[Edit]
If you have huge number of files, it may exceed the capability of bash. Here is an alternative with python
using dataframe
.
#!/usr/bin/python
import glob
import pandas as pd
import re
files = glob.glob('data*')
dfs = [] # list of dataframes
for f in files:
df = pd.read_csv(f, header=None, names=[f]) # read file and assign column
num = re.sub(r'.*_', '', f) # extract number of the file
df.loc[-1] = [num] # insert the number in the first row
df.index = df.index 1 # adjust the index
df = df.sort_index()
df = df.apply(pd.to_numeric, errors='coerce') # force the cell values to floats
dfs.append(df) # add as a new column
df = pd.concat(dfs, axis=1, join='outer') # create a dataframe from the list of dataframes (equivalent of "paste")
df = df.fillna(0) # fill empty cells with 0
df.sort_values(by=0, axis=1, inplace=True) # sort by the file number
df= df.drop(0, axis=0) # remove the row of file number
print(df.to_string(index=False, header=False)) # print the dataframe removing index and header
which will produce the same results.