Cut string in bash based on position of occurrence in first line-CodePudding

I'm trying to select specific columns (2 and 3) from command output, where the column separator is also present in the text. Consider this example text (fictional part of nmcli output):

           SSID BSSID             RSSI CHANNEL
      Something 12:34:56:78:98:ab -10  1
Something guest a2:34:56:78:98:ab -10  2
        Network b2:34:56:78:98:ab -20  3 
 Public network c2:34:56:78:98:ab -30  4

Columns are split by single spaces, but there are also single spaces in the SSIDs. Without those, it's an easy solution using cut.

In GNU sed, I somehow made it work using \w\s\w to replace spaces in SSIDs with underscores:

sed -e 's/\w\s\w/_/g'

But that doesn't work on the Mac. So my second best idea is splitting by position. Is it possible to:

determine the position of BSSID, RSSI and CHANNEL strings in first line
cut between BSSID_pos and RSSI_pos (1) and RSSI_pos - CHANNEL (2)

Is it there a smarter way? Those spaces in SSIDs also confuse awk, so

awk '{print $2 " " $3}'

doesn't work in those lines.

--- EDIT --- The problem is, there are also spaces in the last field when using on data produced by MacOS airport -s, so I can't go from back either:

                            SSID BSSID             RSSI CHANNEL HT CC SECURITY (auth/unicast/group)
                             AA1 d8:12:b6:b8:10:d5 -92  9       Y -- WPA(PSK/AES/AES) RSN(PSK/AES/AES)
                     XXXXX Guest ba:11:e4:4a:01:71 -90  6       Y -- RSN(PSK/AES/AES)
                         Netwrrk 5a:99:68:b7:0d:ca -89  36      Y DE RSN(PSK/AES/AES)
                      T-3_7edb87 64:11:ea:7e:db:87 -87  11      Y -- WPA(PSK/AES,TKIP/TKIP) RSN(PSK/AES,TKIP/TKIP)
                           abcde 58:22:8f:c6:7c:de -86  6       Y -- RSN(PSK/AES/AES)
                           abcde 64:66:88:3f:b2:9a -78  8       Y -- RSN(PSK/AES,TKIP/TKIP)
                           ababa 74:ac:77:b1:e3:59 -74  6       Y -- RSN(PSK/AES/AES)

Apologies for two questions in one; thank you all!

CodePudding user response：

Using any awk in any shell on every Unix box:

$ awk 'NR==1{ match($0,/BSSID.*RSSI/); next } { $0=substr($0,RSTART,RLENGTH); $1=$1; print }' file
d8:12:b6:b8:10:d5 -92
ba:11:e4:4a:01:71 -90
5a:99:68:b7:0d:ca -89
64:11:ea:7e:db:87 -87
58:22:8f:c6:7c:de -86
64:66:88:3f:b2:9a -78
74:ac:77:b1:e3:59 -74

CodePudding user response：

The following has been tested with GNU awk and the BSD awk that comes with macOS (but as Ed commented it should work with any POSIX awk):

$ awk 'NR==1 {n = match($0, /[[:space:]]BSSID[[:space:]]/)   1}
       NR>1  {$0 = substr($0, n); print $1, $2}' file.txt
12:34:56:78:98:ab -10
a2:34:56:78:98:ab -10
b2:34:56:78:98:ab -20
c2:34:56:78:98:ab -30

The first block applies only on the first line and stores the index of BSSID in variable n. The second block applies to all other lines. It modifies them by suppressing the n-1 first characters, and prints the new first and second fields.

Just for fun (because it is a bit more complex), here is also a sed solution based on the same idea of parsing the header line, and tested with GNU sed and the BSD sed that comes with macOS.

$ sed -En '
1 {
  s/^(.*[[:space:]])BSSID[[:space:]].*/\1/
  h
}
2,$ {
  G
  :a
  s/.(.*\n.*)./\1/
  ta
  s/^([^[:space:]] [[:space:]] [^[:space:]] ).*/\1/
  p
}' file.txt
12:34:56:78:98:ab -10
a2:34:56:78:98:ab -10
b2:34:56:78:98:ab -20
c2:34:56:78:98:ab -30

We use extended regular expressions (-E) and suppress the default echoing (-n).

We first delete the end of the first line starting at BSSID (substitute command s/^(.*[[:space:]])BSSID[[:space:]].*/\1/) and store the result in the hold space (h). With your first example the hold space now contains (beginning and end marked with ^ and $):

^           SSID $

All lines except the first (2,$) are modified as follows, and printed (final p):

Append a newline followed by the hold space (G). The second line of your example becomes:
```
^      Something 12:34:56:78:98:ab -10  1\n           SSID $
```
Iterate (loop :a ... ta) by deleting the first character and the last character after the newline until all characters after the newline have been deleted (substitute command s/.(.*\n.*)./\1/). The second line of your example becomes:
```
^12:34:56:78:98:ab -10  1\n$
```
Delete everything after (and including) the second string of spaces (s/^([^[:space:]] [[:space:]] [^[:space:]] ).*/\1/). The second line of your example becomes:
```
^12:34:56:78:98:ab -10$
```

The GNU sed version is a bit more compact, if it matters:

$ sed -En '1{s/^(.*)\<BSSID\>.*/\1/;h;n}
    {G;:a;s/.(.*\n.*)./\1/;ta;s/^(\S \s \S ).*/\1/;p}' file.txt

CodePudding user response：

How about using GNU grep here(tested in RedHat), which will simply catch the needed string by using regex in it. Written and tested with shown samples Only.

Here is the Online demo for used regex.

grep -oP '(?<=\s)[0-9a-f]{2}(:[0-9a-f]{2}){5}\s -?[[:digit:]]{2}'  Input_file

OR in perl try following (thanks to @tripleee for this):

perl -nle 'm/(?<=\s)[0-9a-f]{2}(:[0-9a-f]{2}){5}\s -?[[:digit:]]{2}/ and print "$&"'  Input_file

CodePudding user response：

This is why many Unix tools prefer to put the variable-length fields last (compare ls -l) but as long as the tail has predictable formatting, just pick out the things counting from the end.

sed '2,$s/^.* \([0-9a-f][0-9a-f]:[0-9a-f][0-9a-f]:[0-9a-f][0-9a-f]:[0-9a-f][0-9a-f]:[0-9a-f][0-9a-f]:[0-9a-f][0-9a-f] *[-0-9]*\) [0-9]*$/\1/'

awk 'FNR>1{ print $(NF-2) " " $(NF-1) }'

A better solution altogether is to read the manual page and search for an option to print only the information you need.

nmcli -t -f BSSID,RSSI

(I can't run nmcli inside Docker, it seems, so no way to test this; but I trust you can take it from here even if the command line doesn't directly work.)

CodePudding user response：

I would harness substr string function of GNU AWK following way, let file.txt content be

           SSID BSSID             RSSI CHANNEL
      Something 12:34:56:78:98:ab -10  1
Something guest a2:34:56:78:98:ab -10  2
        Network b2:34:56:78:98:ab -20  3 
 Public network c2:34:56:78:98:ab -30  4

then

awk '{print substr($0,17,22)}' file.txt

gives output

BSSID             RSSI
12:34:56:78:98:ab -10 
a2:34:56:78:98:ab -10 
b2:34:56:78:98:ab -20 
c2:34:56:78:98:ab -30

Explanation: I print substring starting at 17th character, which is 22 characters long.

(tested in GNU Awk 5.0.1)