I thought it would be easy to define a string such as "1 2 3" and use it within AWK (GAWK) to extract the required fields, how wrong I have been.
I have tried creating AWK arrays, BASH arrays, splitting, string substitution etc, but could not find any method to use the resulting 'chunks' (ie the column/field numbers) in a print statement.
I believe Akshay Hegde has provided an excellent solution with the get_cols function, here
but it was over 8 years ago, and I am really struggling to work out 'how it works', namely, what this is doing;
s = length(s) ? s OFS $(C[i]) : $(C[i])
I am unable to post a comment asking for clarification due to my lack of reputation (and it is an old post). Is someone able to explain how the solution works?
NB I don't think I need the sub as I using the following to cleanup (replace all non-numeric characters with a comma, ie seperator, and sort numerically)
Columns=$(echo $Input_string | sed 's/[^0-9]\ /,/g') Columns=$(echo $Columns | xargs -n1 | sort -n | xargs)
(using this string, the awk would be Executed using awk -v cols=$Columns -f test.awk infile in the given solution)
Given the informative answer from @Ed Morton, with a nice worked example, I have attempted to remove the need for a function (and also an additional awk program file). The intention is to have this within a shell script, and I would rather it be self contained, but also, further investigation into 'how it works'.
Fields="1 2 3"
echo $Fields | awk -F "," '{n=split($0,Column," "); for(i=1;i<=n;i ) s = length(s) ? s OFS $(Column[i]) : $(Column[i])}END{print "s="s " arr1="Column[1]" arr2="Column[2]" arr3="Column[3]}'
The results have surprised me (taking note of my Comment to Ed)
s=1 2 3 arr1=1 arr2=2 arr3=3
The above clearly shows the split has worked into the array, but I thought s would include $ for each ternary operator concatenation, ie "$1 $2 $3"
Moreso, I was hoping to append the actual file to the above command, which I have found allows me to use echo $string | awk '{program}' file.name
NB it is a little insulting that my question has been marked as -1 indicating little research effort, as I have spent days trying to work this out.
Taking all the information above, I think s results in "1 2 3", but the print doesn't accept this in the same way as it does as it is called from a function, simply trying to 'print 1 2 3' in relation to the file, which seems to be how all my efforts have ended up. This really confuses me, as Ed's 'diagonal' example works from command line, indicating that concept of 'print s' is absolutely fine when used with a file name input. Can anyone suggest how this (example below) can work?
I don't know if using echo pipe and appending the file name is strictly allowed, but it appears to work (?!?!?!)
(failed result) echo $Fields | awk -F "," '{n=split($0,Column," "); for(i=1;i<=n;i ) s = length(s) ? s OFS $(Column[i]) : $(Column[i])}END{print s}' myfile.txt
This appears to go through myfile.txt and output all lines containing many comma separated values, ie the whole file (I haven't included the values, just for illustration only) ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
CodePudding user response:
what this is doing;
s = length(s) ? s OFS $(C[i]) : $(C[i])
You have encountered a ternary operator, it has following syntax
condition ? valueiftrue : valueiffalse
length
function, when provided with single argument does return number of characters, in GNU AWK
integer 0 is considered false, others integers are considered true, so in this case it is is not empty check. When s
is not empty (it might be also not initalized yet, as GNU AWK
will assume empty string in such case), it is concatenated with output field separator (OFS
, default is space) and C[i]
-th field value and assigned to variable s
, when s
is empty value of C[i]
-th field value. Used multiple time this allows building of string of values sheared by OFS
, consider following simple example, let say you want to get diagonal of 2D matrix, stored in file.txt
with following content
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
then you might do
awk '{s = length(s) ? s OFS $(NR) : $(NR)}END{print s}' file.txt
which will get output
1 7 13 19 25
Explanation: NR
is number row, so 1st row $(NR)
is 1st field, for 2nd row it is 2nd field, for 3rd it is 3rd field and so on
(tested in GNU Awk 5.0.1)