Home > Blockchain >  How do I extract two different keywords from two different lines in a file in bash shell?
How do I extract two different keywords from two different lines in a file in bash shell?

Time:05-15

I have a file called data.txt. when I read the file, the content looks like the below.

$ cat data.txt 
name: linuxVol
id: 6
type: Linux
dir excludes: .snapshot*
              ~snapshot*
              .zfs
              .isilon
              .ifsvar
              .lustre
inode: 915720
free_space: 35.6TiB (auto)
total_capacity: 95.0TiB (auto)
number_of_files: 5,789,643
number_of_dirs: 520,710
mounts: https://server1.example.com:30002: /mnt/tube

How can I extract keywords Linux from the third line and server1.example.com from the last line and then represent them in the same line separated by space? The output should show something like below

Linux server1.example.com

I tried to do something like this but not sure how could I extract server1.example.com

cat data.txt | egrep "type|mounts" | awk '{print $NF}' | tr "\n" " "

output was:

Linux /mnt/tube

My expected output:

Linux server1.example.com

Solving it with AWK/SED will work for me.

Thank you!

CodePudding user response:

Whenever you have tag-value pairs in your input I find it best to create an array (f[] below) to hold those mappings and then you can access the values by indexing that array with the tags:

$ cat tst.awk
{
    tag = val = $0
    sub(/:.*/,"",tag)
    sub(/[^:] : */,"",val)
    f[tag] = val
}
END {
    sub("[^:] ://","",f["mounts"])
    sub(/:.*/,"",f["mounts"])
    print f["type"], f["mounts"]
}

$ awk -f tst.awk data.txt
Linux server1.example.com

The above would need a minor tweak if you also wanted to handle the multi-line value for the dir excludes tag but as written it gives you the ability to read/test/print all other values by their tags.

CodePudding user response:

If there are just two lines in the file, one with type: and another with mount: and they come in a set order, you can use

awk '/type:|mounts:/{gsub(/https?:\/\/|:.*/, "", $2); a = (length(a)==0 ? "" : a " ") $2} END{print a}' file

If a line contains type: or mounts:, the http:// or https:// and all text after : are removed from Field 2, and then the value is either assigned to a or appended with a space to a, and once there is an end of file, the a value is printed.

Details:

  • /type:|mounts:/ - find lines containng type: or mounts:
  • gsub(/https?:\/\/|:.*/, "", $2) - removes http://, https:// or : and the rest of the string from Field 2 value
  • a = (length(a)==0 ? "" : a " ") $2 - assign a space Field 2 value to a if a is not empty, if it is, just assign Field 2 value to a
  • END{print a} - at the end of the file processing, print a value.

See the online demo:

#!/bin/bash
s='name: linuxVol
id: 6
type: Linux
dir excludes: .snapshot*
              ~snapshot*
              .zfs
              .isilon
              .ifsvar
              .lustre
inode: 915720
free_space: 35.6TiB (auto)
total_capacity: 95.0TiB (auto)
number_of_files: 5,789,643
number_of_dirs: 520,710
mounts: h''ttps://server1.example.com:30002: /mnt/tube'

awk '/type:|mounts:/{gsub(/https?:\/\/|:.*/, "", $2); a = (length(a)==0 ? "" : a " ") $2} END{print a}' <<< "$s"

Output:

Linux server1.example.com

CodePudding user response:

Using gnu-awk you could also set the field separator to : followed by 1 or more spaces.

Then check the first field for type or mounts, and for mounts use a capture group to get the part after the https:// part

Given that the order of the lines is the same, and both keywords are present, you can concat the values.

awk -F ":[[:space:]] " '
$1 == "type" {s = $2}
$1 == "mounts" && match($2, /https?:\/\/([^[:space:]:] )/, a) {s = s " " a[1]}
END {print s}
' data.txt

Output

Linux server1.example.com

CodePudding user response:

I think a sed solution is also needed

$ sed -n '/type:/{s/[^ ]* \(.*\)/\1/;h;d};/mounts:/{s|[^/]*//\([^:]*\).*|\1|;x;G;s/\n/ /p}' input_file
Linux server1.example.com

Or as a script

$ cat script.sed
/type:/ {                     
    s/[^ ]* \(.*\)/\1/
    h
    d
}
/mounts:/ {
    s|[^/]*//\([^:]*\).*|\1|
    x
    G
    s/\n/ /p
}
$ sed -nf script.sed input_file
Linux server1.example.com
  • Related