Home > Software design >  How to extract the number after specific word using awk?
How to extract the number after specific word using awk?

Time:06-11

I have several lines of text. I want to extract the number after specific word using awk.

I tried the following code but it does not work.

At first, create the test file by: vi test.text. There are 3 columns (the 3 fields are generated by some other pipeline commands using awk).

Index  AllocTres                              CPUTotal
1      cpu=1,mem=256G                         18
2      cpu=2,mem=1024M                        16
3                                             4
4      cpu=12,gres/gpu=3                      12
5                                             8
6                                             9
7      cpu=13,gres/gpu=4,gres/gpu:ret6000=2   20
8      mem=12G,gres/gpu=3,gres/gpu:1080ti=1   21

Please note there are several empty fields in this file. what I want to achieve is to extract the number after the first gres/gpu= in each line (if no gres/gpu= occurs in this line, the default number is 0) using a pipeline like: cat test.text | awk '{some_commands}' to output 4 columns:

Index  AllocTres                              CPUTotal   GPUAllocated
1      cpu=1,mem=256G                         18         0
2      cpu=2,mem=1024M                        16         0
3                                             4          0
4      cpu=12,gres/gpu=3                      12         3
5                                             8          0
6                                             9          0
7      cpu=13,gres/gpu=4,gres/gpu:ret6000=2   20         4
8      mem=12G,gres/gpu=3,gres/gpu:1080ti=1   21         3

CodePudding user response:

Firstly: awk do not need cat, it could read files on its' own. Combining cat and awk is generally discouraged as useless use of cat.

For this task I would use GNU AWK following way, let file.txt content be

cpu=1,mem=256G
cpu=2,mem=1024M

cpu=12,gres/gpu=3


cpu=13,gres/gpu=4,gres/gpu:ret6000=2
mem=12G,gres/gpu=3,gres/gpu:1080ti=1

then

awk 'BEGIN{FS="gres/gpu="}{print $2 0}' file.txt

output

0
0
0
3
0
0
4
3

Explanation: I inform GNU AWK that field separator (FS) is gres/gpu= then for each line I do print 2nd field increased by zero. For lines without gres/gpu= $2 is empty string, when used in arithmetic context this is same as zero so zero plus zero gives zero. For lines with at least one gres/gpu= increasing by zero provokes GNU AWK to find longest prefix which is legal number, thus 3 (4th line) becomes 3, 4, (7th line) becomes 4, 3, (8th line) becomes 3.

(tested in GNU Awk 5.0.1)

CodePudding user response:

Using sed

$ sed '1s/$/\tGPUAllocated/;s~.*gres/gpu=\([0-9]\).*~& \t\1~;1!{\~gres/gpu=[0-9]~!s/$/ \t0/}' input_file
Index  AllocTres                              CPUTotal  GPUAllocated
1      cpu=1,mem=256G                         18        0
2      cpu=2,mem=1024M                        16        0
3                                             4         0
4      cpu=12,gres/gpu=3                      12        3
5                                             8         0
6                                             9         0
7      cpu=13,gres/gpu=4,gres/gpu:ret6000=2   20        4
8      mem=12G,gres/gpu=3,gres/gpu:1080ti=1   21        3

CodePudding user response:

With your shown samples in GNU awk you can try following code. Written and tested in GNU awk. Simple explanation would be using awk's match function where using regex gres\/gpu=([0-9] )(escaping / here) and creating one and only capturing group to capture all digits coming after =. Once match is found printing current line followed by array's arr's 1st element 0(to print zero in case no match found for any line) here.

awk '
FNR==1{
  print $0,"GPUAllocated"
  next
}
{
  match($0,/gres\/gpu=([0-9] )/,arr)
  print $0,arr[1] 0
}
' Input_file

CodePudding user response:

awk '
    BEGIN{FS="\t"} 
    NR==1{
        $(NF 1)="GPUAllocated"
    }
    NR>1{
        $(NF 1)=FS 0
    } 
    /gres\/gpu=/{
        split($0, a, "=")
        gp=a[3]; gsub(/[ ,].*/, "", gp)  
        $NF=FS gp
    }1' test.text 

Index  AllocTres                              CPUTotal GPUAllocated
1      cpu=1,mem=256G                         18        0
2      cpu=2,mem=1024M                        16        0
3                                             4         0
4      cpu=12,gres/gpu=3                      12        3
5                                             8         0
6                                             9         0
7      cpu=13,gres/gpu=4,gres/gpu:ret6000=2   20        4
8      mem=12G,gres/gpu=3,gres/gpu:1080ti=1   21        3
  • Related