Home > Mobile >  extract text between the two blocks using regex
extract text between the two blocks using regex

Time:06-10

I am trying to extract the text between the two strings using the following regex.

(?s)Non-terminated Pods:.*?in total.\R(.*)(?=Allocated resources)

This regex looks fine in regex101 but somehow does not print the pod details when used with perl or grep -P. Below command results in empty output.

kubectl describe  node |perl -le '/(?s)Non-terminated Pods:.*?in total.\R(.*)(?=Allocated resources)/m; printf "$1"'

Here is the sample input:

PodCIDRs:                     10.233.65.0/24
Non-terminated Pods:          (7 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
Allocated resources:

Question:

  1. how to extract the info from the above output, to look like below. What is wrong in the regex or the command that I am using?
Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)  

Question-2: What if I have two blocks of similar inputs. How to extract the pod details ? Eg:

if the input is:

PodCIDRs:                     10.233.65.0/24
Non-terminated Pods:          (7 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
Allocated resources:
....some
.......random data...
PodCIDRs:                     10.233.65.0/24
Non-terminated Pods:          (7 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo-1                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-2                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp3-2                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
Allocated resources:

CodePudding user response:

With some obvious assumptions, and keeping it close to the pattern in the question:

perl -0777 -wnE'
    @pods = /Non-terminated\s Pods:\s \([0-9] \s in\s total\)\n(.*?)\nAllocated resources:/gs;
    say for @pods
' input-file

(note modifiers on this wide line: /gs)


It is not stated in the question how precisely is that regex "used with perl".

When I use the regex from the question verbatim, instead of the one used in this answer, it works (and without the /s modifier, as it should). To work with multiple such blocks in a file and with other text in between we need to change its (.*) to (.*?).

Explanation of the command-line program above:

  • the -0777 switch makes it read the file whole into a string, available in the program in the variable $_, on which the regex is applied by default
    (the switch -g is available as an alias for -0777, starting with 5.36.0)

  • we still need the -n switch so that the program iterates over the "lines" of input (STDIN or a file). In this case the input record separator has been undefined so there is just one "line"

  • the regex captures are assigned to the array @pods, for further processing

CodePudding user response:

Using gnu-grep you can use your regex with some tweaks:

kubectl describe  node |
grep -zoP '(?s)Non-terminated Pods:.*?in total.\R\K(.*?)(?=Allocated resources)'

  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
  • Used \K (match reset) after \R to remove that line from output
  • Used -z option to treat treat input and output data as sequences of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.

PS: Same regex will work with second input block as well with header line shown before each block.


Alternatively you can use any version sed for this job as well:

kubectl describe  node |
sed -n '/Non-terminated Pods:.*in total.*/,/Allocated resources:/ {//!p;}'

  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
  • Related