Add every 3rd column of different files into one file-CodePudding

I'm new to awk, so I hope someone could help

I have 55 text files (in TAB and have similar naming ending with .txt), all files contain exactly 55 rows and 4 columns. No row or column is missing. The only differences are the values within the file. They look similar like thes below (only containing 4 rows now and 3 files)

Row1    355 389 623
Row2    58  906 1373
Row3    338 3   30
Row4    2   976 0

Row1    543 31  6642
Row2    0   0   453
Row3    45  5   42
Row4    425 12  8

Row1    452 6   352
Row2    765 21  54
Row3    0   45  999
Row4    451 765 3

I would like to add the first column once (the names), and then every 3rd column from all files into one. edit: and if possible the file name The output should look like this:

        1.txt 2.txt 3.txt
Row1    389 31  6
Row2    905 0   21
Row3    3   5   45
Row4    976 12  765

I tried this code

paste * | awk 'FNR==NR{a[FNR]=$1; next} {print a[FNR],$3}' *.txt > output.txt | column -t

However, it only adds values of the first 2 files. Values from other files were not present. What can I do? Thanks!

CodePudding user response：

You could try this awk:

awk -F '\t' '
FNR==NR {table[FNR] = $1}
{table[FNR] = table[FNR] "\t" $3}

END {
    for (i=1; i<=FNR; i  ) {
        print table[i]
    }
}' *

The last (55th) value of FNR is used to print the array, so if the files don't all have the same number of lines, you will need to address that.

If you want to use paste, maybe something like this:

paste * |
awk '
{
    printf "%s", $1
    for (i=3; i<55*4; i =4) {
        printf "\t%s", $i
    }
    printf "\n"
}'

55*4 is number of files times number of columns. Hard coded. There are various methods of counting these if necessary.

CodePudding user response：

Here's a Ruby solution that can handle big files with an heterogeneous number of lines:

#!/usr/bin/env ruby
  
files = ARGV.map{|arg| File.open(arg)}
close_count = 0

loop do
  row_name = nil
  values = files.map{ |file|
    next if file.closed?
    if line = file.gets
      fields = line.split
      row_name = fields[0] if row_name.nil?
      fields[2]
    else
      close_count  = 1
      file.close
    end
  }
  break if close_count == files.count
  puts "#{row_name}\t"   values.join("\t")
end

# head file{1,2,3}.tsv
==> file1.tsv <==
Row1    1111    111     11
Row2    1112    112     12

==> file2.tsv <==
Row1    2221    221     21
Row2    2222    222     22
Row3    2223    223     23

==> file3.tsv <==
Row1    3331    331     31

# ./script.rb file{1,2,3}.tsv
Row1    111     221     331
Row2    112     222
Row3            223

CodePudding user response：

Assumptions:

input/output field delimiter is <TAB>
all input files contain the same number of rows
all rows have at least 3 columns
no need to sort the input files (ie, Row1 is always in row #1, Row2 is always in row #2, etc)
*.txt will expand to a list of files in the desired order (eg, 1.txt1, 2.txt, 3.txt, etc); otherwise OP may need to pre-sort the file names before feeding them to the proposed solution

One awk idea:

awk '
BEGIN   { FS=OFS="\t" }                           # input/output field delimiter is <TAB>

FNR==1  { lines[0]=lines[0] OFS FILENAME }        # append filename to "header" record

FNR==NR { lines[FNR]=$1 }                         # save the "RowN" string

        { lines[FNR]=lines[FNR] OFS $3 }          # append field #3 to the line

END     { for (i=0;i<=FNR;i  )                    # loop through the line numbers ...
              print lines[i]                      # printing each line to stdout
        }
' *.txt

Assuming the 3 sample input files are named {1..3}.txt this awk script generates:

        1.txt   2.txt   3.txt
Row1    389     31      6
Row2    906     0       21
Row3    3       5       45
Row4    976     12      765