I'm new to awk, so I hope someone could help
I have 55 text files (in TAB and have similar naming ending with .txt), all files contain exactly 55 rows and 4 columns. No row or column is missing. The only differences are the values within the file. They look similar like thes below (only containing 4 rows now and 3 files)
Row1 355 389 623
Row2 58 906 1373
Row3 338 3 30
Row4 2 976 0
Row1 543 31 6642
Row2 0 0 453
Row3 45 5 42
Row4 425 12 8
Row1 452 6 352
Row2 765 21 54
Row3 0 45 999
Row4 451 765 3
I would like to add the first column once (the names), and then every 3rd column from all files into one. edit: and if possible the file name The output should look like this:
1.txt 2.txt 3.txt
Row1 389 31 6
Row2 905 0 21
Row3 3 5 45
Row4 976 12 765
I tried this code
paste * | awk 'FNR==NR{a[FNR]=$1; next} {print a[FNR],$3}' *.txt > output.txt | column -t
However, it only adds values of the first 2 files. Values from other files were not present. What can I do? Thanks!
CodePudding user response:
You could try this awk:
awk -F '\t' '
FNR==NR {table[FNR] = $1}
{table[FNR] = table[FNR] "\t" $3}
END {
for (i=1; i<=FNR; i ) {
print table[i]
}
}' *
The last (55th) value of FNR is used to print the array, so if the files don't all have the same number of lines, you will need to address that.
If you want to use paste
, maybe something like this:
paste * |
awk '
{
printf "%s", $1
for (i=3; i<55*4; i =4) {
printf "\t%s", $i
}
printf "\n"
}'
55*4
is number of files times number of columns. Hard coded. There are various methods of counting these if necessary.
CodePudding user response:
Here's a Ruby solution that can handle big files with an heterogeneous number of lines:
#!/usr/bin/env ruby
files = ARGV.map{|arg| File.open(arg)}
close_count = 0
loop do
row_name = nil
values = files.map{ |file|
next if file.closed?
if line = file.gets
fields = line.split
row_name = fields[0] if row_name.nil?
fields[2]
else
close_count = 1
file.close
end
}
break if close_count == files.count
puts "#{row_name}\t" values.join("\t")
end
# head file{1,2,3}.tsv
==> file1.tsv <==
Row1 1111 111 11
Row2 1112 112 12
==> file2.tsv <==
Row1 2221 221 21
Row2 2222 222 22
Row3 2223 223 23
==> file3.tsv <==
Row1 3331 331 31
# ./script.rb file{1,2,3}.tsv
Row1 111 221 331
Row2 112 222
Row3 223
CodePudding user response:
Assumptions:
- input/output field delimiter is
<TAB>
- all input files contain the same number of rows
- all rows have at least 3 columns
- no need to sort the input files (ie,
Row1
is always in row #1,Row2
is always in row #2, etc) *.txt
will expand to a list of files in the desired order (eg,1.txt1
,2.txt
,3.txt
, etc); otherwise OP may need to pre-sort the file names before feeding them to the proposed solution
One awk
idea:
awk '
BEGIN { FS=OFS="\t" } # input/output field delimiter is <TAB>
FNR==1 { lines[0]=lines[0] OFS FILENAME } # append filename to "header" record
FNR==NR { lines[FNR]=$1 } # save the "RowN" string
{ lines[FNR]=lines[FNR] OFS $3 } # append field #3 to the line
END { for (i=0;i<=FNR;i ) # loop through the line numbers ...
print lines[i] # printing each line to stdout
}
' *.txt
Assuming the 3 sample input files are named {1..3}.txt
this awk
script generates:
1.txt 2.txt 3.txt
Row1 389 31 6
Row2 906 0 21
Row3 3 5 45
Row4 976 12 765