how to use awk to read a part of line including number of space?-CodePudding

I want to extract a value using "awk subtring" which should also count the number of spaces without any separator.

For example, below is the input, and I want to extract the "29611", including space,

201903011232101029 2961104E3021  223  0 12113  5  15 8288 298233 0  45  0     39    4

I used this method, but it used space as a separator:

more abbas.dat | awk '{print substr($1,1,16),substr($1,17,25)}'

Expected output should be :

201903011232101029  2961

But it prints only

201903011232101029

My question is how can we print using "substr" which count spaces?

I know, I can use this command to get the desired output but it is not helpful for my objective

more abbas.dat | awk '{print substr($1,1,16),substr($2,1,5)}'

CodePudding user response：

1st solution: With your shown samples, please try following awk code. Written and tested in GNU awk. Using match function of awk here to get required output.

To print 1st field followed by varying spaces followed by 5 digits from 2nd field then use following:

awk 'match($0,/^[0-9] [[:space:]] [0-9]{5}/){print substr($0,RSTART,RLENGTH)}'  Input_file

OR To print 16 letters in 1st field and 5 from second field including varying length of spaces between 1st and 2nd fields:

awk 'match($0,/^([0-9]{16})[^[:space:]] ([[:space:]] )([0-9]{5})/,arr){print arr[1] arr[2] arr[3]}'  Input_file

2nd solution: Using GNU grep please try following, considering that your 2nd column first 4 needed values can be anything(eg: digits, alphabets etc).

grep -oP '^\S \s .{5}' Input_file

OR to only match 4 digits in 2nd field have a minor change in above grep.

grep -oP '^\S \s \d{5}' Input_file

CodePudding user response：

I think the simplest way is to include "Fs" in your command.

more abbas.dat | awk -Fs '{print substr($1,1,16),substr($1,17,25)}'

CodePudding user response：

If there is always one space you can use the following command which will print the first group, plus the first 5 character of the second group.
N.B. It's not clear in the question whether you want 4 or 5 characters but that can be adjusted easily.

more abbas.dat | awk '{print $1" "substr($2,1,5) }'

CodePudding user response：

$ awk '{print substr($0,1,24)}' file
201903011232101029 29611

If that's not all you need then edit your question to clarify your requirements.