how to use awk command to filter values by columns when the column pattern is not clear-CodePudding

I have a text file which looks like this

[email protected]      squid:uselessImportCheck     appname    2min  this is a random text    CODE_SMELL
[email protected]      squid:uselessImportCheck     appname    7min  random text here         BUG
[email protected]      squid:uselessImportCheck     appname    2min  random                   VULNERABILITY
[email protected]      squid:uselessImportCheck     appname    9min  text                     CODE_SMELL
[email protected]      squid:uselessImportCheck     appname    3min  text random              BUG

I want to filter first and 6th column of this text, so the answer should looks like

[email protected]     CODE_SMELL
[email protected]     BUG
[email protected]     VULNERABILITY
[email protected]     CODE_SMELL
[email protected]     BUG

I tried this with

awk '{print $1 $6}' filename.txt

But this does not work because even if the 5th column can be visually identified but because it has spaces and unpredictable random texts, the column number cannot be predicted. so the visually 6th column is not the actual 6th column when it gives to the awk command.

Can anyone help me to get the expected output.

Edit -- the text structure I have given here is not the actual one, I mean the required values are not in the first and the last column in the actual text. So I cannot use

awk '{print $1 $NF}'

I only showing this for demonstration

Here is a actual text for the reference.

MAJOR        [email protected]        squid:uselessImportCheck        appname        2min        this is a random text        CODE_SMELL        "Unused import"        "Default-organization"
MAJOR        [email protected]        squid:uselessImportCheck        appname        7min        random text here        BUG        "Unused import"        "Default-organization"
MAJOR        [email protected]        squid:uselessImportCheck        appname        2min        random        VULNERABILITY        "Unused import"        "Default-organization"
MAJOR        [email protected]        squid:uselessImportCheck        appname        9min        text        CODE_SMELL        "Unused import"        "Default-organization"
MAJOR        [email protected]        squid:uselessImportCheck        appname        3min        text random        BUG        "Unused import"        "Default-organization"

each main text field is separated by 8 spaces.

CodePudding user response：

As each main text field is separated by 8 spaces, use eight spaces as the field separator. For example, printing next to the last field, use:

$ awk -F'        ' '{print $(NF-1)}' file

Output:

"Unused import"
"Unused import"
"Unused import"
"Unused import"
"Unused import"

Tested with GNU awk, mawk, busybox awk and awk version 20121220. If using GNU awk, you could: -F' {8}'.

CodePudding user response：

I would use GNU AWK for this task following way, let file.txt content be

MAJOR        [email protected]        squid:uselessImportCheck        appname        2min        this is a random text        CODE_SMELL        "Unused import"        "Default-organization"
MAJOR        [email protected]        squid:uselessImportCheck        appname        7min        random text here        BUG        "Unused import"        "Default-organization"
MAJOR        [email protected]        squid:uselessImportCheck        appname        2min        random        VULNERABILITY        "Unused import"        "Default-organization"
MAJOR        [email protected]        squid:uselessImportCheck        appname        9min        text        CODE_SMELL        "Unused import"        "Default-organization"
MAJOR        [email protected]        squid:uselessImportCheck        appname        3min        text random        BUG        "Unused import"        "Default-organization"

then

awk 'BEGIN{FS="[[:space:]]{2,}"}{print $2,$7}' file.txt

gives output

[email protected] CODE_SMELL
[email protected] BUG
[email protected] VULNERABILITY
[email protected] CODE_SMELL
[email protected] BUG

Explanation: I inform GNU AWK that field separator (FS) is 2 or more ({2,}) whitespace characters, then print 2nd and 7th field for each line. Disclaimer: this solution assume that there is never run of 2 or more whitespace inside 6th column. If you want to know more about FS then read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR.

(tested in gawk 4.2.1)