Home > OS >  Join every 3rd column of different files into one file
Join every 3rd column of different files into one file


I'm new to awk, so I hope someone could help

I have 55 text files like this:

Row1    3553    896     23
Row2    3766    58906   1373
Row53   2976    0       0

I would like to add the first column 1 once (the names), and then every 3rd column from all 55 files. The output should look like this:

Row1    896     854   456   876    7864  etc.
Row2    58906   542   99    33301  4564  etc.
Row53   0       58    48    7816   0     etc.

I tried this code

paste * | awk 'FNR==NR{a[FNR]=$0; next} {print a[FNR],$3}' *.txt > output.txt | column -t

However, it adds the full first file, with all columns and then only the third column from the second file (total of 5 columns). All other files were not present. What can I do? Thanks!

CodePudding user response:

You could try this awk:

awk -F '\t' '
FNR==NR {table[FNR] = $1}
{table[FNR] = table[FNR] "\t" $3}

    for (i=1; i<=FNR; i  ) {
        print table[i]
}' *

The last (55th) value of FNR is used to print the array, so if the files don't all have the same number of lines, you will need to address that.

If you want to use paste, maybe something like this:

paste * |
awk '
    printf "%s", $1
    for (i=3; i<55*4; i =4) {
        printf "\t%s", $i
    printf "\n"

55*4 is number of files times number of columns. Hard coded. There are various methods of counting these if necessary.

CodePudding user response:

Here's a Ruby solution that can handle big files with an heterogeneous number of lines:

#!/usr/bin/env ruby
files = ARGV.map{|arg| File.open(arg)}
close_count = 0

loop do
  row_name = nil
  values = files.map{ |file|
    next if file.closed?
    if line = file.gets
      fields = line.split
      row_name = fields[0] if row_name.nil?
      close_count  = 1
  break if close_count == files.count
  puts "#{row_name}\t"   values.join("\t")
# head file{1,2,3}.tsv
==> file1.tsv <==
Row1    1111    111     11
Row2    1112    112     12

==> file2.tsv <==
Row1    2221    221     21
Row2    2222    222     22
Row3    2223    223     23

==> file3.tsv <==
Row1    3331    331     31
# ./script.rb file{1,2,3}.tsv
Row1    111     221     331
Row2    112     222
Row3            223
  • Related