Home > other >  pasting file side by side
pasting file side by side

Time:11-07

I have many ascii files in a directory, i just want to sort the file name numerically and want to paste side by side. Secondly, after pasting i want to make all the column of same length by appending zero at the end.

My files are named as

 data_Z_1   data_N_457 data_E_45
  1.5          1.2       2.3
  2.0          2.3       1.8
               4.5

At first I just want sort the above file names numerically as given below and then want to paste side by side as

data_Z_1  data_E_45 data_N_457
  1.5       2.3      1.2
  2.0       1.8      2.3
                     4.5

Secondly i need to make all the columns equal length in a pasted file, so that output should be like

   1.5    2.3    1.2
   2.0    1.8    2.3
   0.0    0.0    4.5

I tried as below:
ls data_*_* | sort -V 

But it doesnot work.Can anybody help me overcoming this problem.Thanks in advance.

CodePudding user response:

Would you please try the following:

paste $(ls data* | sort -t_ -k3n) | awk -F'\t' -v OFS='\t' '
{for (i=1; i<=NF; i  ) if ($i == "") $i = "0.0"} 1'

Output:

1.5     2.3     1.2
2.0     1.8     2.3
0.0     0.0     4.5
  • sort -t_ -k3n sets the field separator to _ and numerically sorts the filenames on the 3rd field values.
  • The options -F'\t' -v OFS='\t' to the awk command assign input/output field separator to a tab character.
  • The awk statement for (i=1; i<=NF; i ) if ($i == "") $i = "0.0" scans the input fields and sets 0.0 for the empty fields.
  • The final 1 is equivalent to print $0 to print the fields.

[Edit]
If you have huge number of files, it may exceed the capability of bash. Here is an alternative with python using dataframe.

#!/usr/bin/python

import glob
import pandas as pd
import re

files = glob.glob('data*')
dfs = []                                                # list of dataframes
for f in files:
    df = pd.read_csv(f, header=None, names=[f])         # read file and assign column
    num = re.sub(r'.*_', '', f)                         # extract number of the file
    df.loc[-1] = [num]                                  # insert the number in the first row
    df.index = df.index   1                             # adjust the index
    df = df.sort_index()
    df = df.apply(pd.to_numeric, errors='coerce')       # force the cell values to floats
    dfs.append(df)                                      # add as a new column
df = pd.concat(dfs, axis=1, join='outer')               # create a dataframe from the list of dataframes (equivalent of "paste")
df = df.fillna(0)                                       # fill empty cells with 0
df.sort_values(by=0, axis=1, inplace=True)              # sort by the file number
df= df.drop(0, axis=0)                                  # remove the row of file number
print(df.to_string(index=False, header=False))          # print the dataframe removing index and header

which will produce the same results.

  • Related