I have a file called data.txt. when I read the file, the content looks like the below.
$ cat data.txt
name: linuxVol
id: 6
type: Linux
dir excludes: .snapshot*
~snapshot*
.zfs
.isilon
.ifsvar
.lustre
inode: 915720
free_space: 35.6TiB (auto)
total_capacity: 95.0TiB (auto)
number_of_files: 5,789,643
number_of_dirs: 520,710
mounts: https://server1.example.com:30002: /mnt/tube
How can I extract keywords Linux
from the third line and server1.example.com
from the last line and then represent them in the same line separated by space? The output should show something like below
Linux server1.example.com
I tried to do something like this but not sure how could I extract server1.example.com
cat data.txt | egrep "type|mounts" | awk '{print $NF}' | tr "\n" " "
output was:
Linux /mnt/tube
My expected output:
Linux server1.example.com
Solving it with AWK/SED will work for me.
Thank you!
CodePudding user response:
Whenever you have tag-value pairs in your input I find it best to create an array (f[]
below) to hold those mappings and then you can access the values by indexing that array with the tags:
$ cat tst.awk
{
tag = val = $0
sub(/:.*/,"",tag)
sub(/[^:] : */,"",val)
f[tag] = val
}
END {
sub("[^:] ://","",f["mounts"])
sub(/:.*/,"",f["mounts"])
print f["type"], f["mounts"]
}
$ awk -f tst.awk data.txt
Linux server1.example.com
The above would need a minor tweak if you also wanted to handle the multi-line value for the dir excludes
tag but as written it gives you the ability to read/test/print all other values by their tags.
CodePudding user response:
If there are just two lines in the file, one with type:
and another with mount:
and they come in a set order, you can use
awk '/type:|mounts:/{gsub(/https?:\/\/|:.*/, "", $2); a = (length(a)==0 ? "" : a " ") $2} END{print a}' file
If a line contains type:
or mounts:
, the http://
or https://
and all text after :
are removed from Field 2, and then the value is either assigned to a
or appended with a space to a
, and once there is an end of file, the a
value is printed.
Details:
/type:|mounts:/
- find lines containngtype:
ormounts:
gsub(/https?:\/\/|:.*/, "", $2)
- removeshttp://
,https://
or:
and the rest of the string from Field 2 valuea = (length(a)==0 ? "" : a " ") $2
- assigna
space Field 2 value toa
ifa
is not empty, if it is, just assign Field 2 value toa
END{print a}
- at the end of the file processing, printa
value.
See the online demo:
#!/bin/bash
s='name: linuxVol
id: 6
type: Linux
dir excludes: .snapshot*
~snapshot*
.zfs
.isilon
.ifsvar
.lustre
inode: 915720
free_space: 35.6TiB (auto)
total_capacity: 95.0TiB (auto)
number_of_files: 5,789,643
number_of_dirs: 520,710
mounts: h''ttps://server1.example.com:30002: /mnt/tube'
awk '/type:|mounts:/{gsub(/https?:\/\/|:.*/, "", $2); a = (length(a)==0 ? "" : a " ") $2} END{print a}' <<< "$s"
Output:
Linux server1.example.com
CodePudding user response:
Using gnu-awk
you could also set the field separator to :
followed by 1 or more spaces.
Then check the first field for type or mounts, and for mounts use a capture group to get the part after the https:// part
Given that the order of the lines is the same, and both keywords are present, you can concat the values.
awk -F ":[[:space:]] " '
$1 == "type" {s = $2}
$1 == "mounts" && match($2, /https?:\/\/([^[:space:]:] )/, a) {s = s " " a[1]}
END {print s}
' data.txt
Output
Linux server1.example.com
CodePudding user response:
I think a sed
solution is also needed
$ sed -n '/type:/{s/[^ ]* \(.*\)/\1/;h;d};/mounts:/{s|[^/]*//\([^:]*\).*|\1|;x;G;s/\n/ /p}' input_file
Linux server1.example.com
Or as a script
$ cat script.sed
/type:/ {
s/[^ ]* \(.*\)/\1/
h
d
}
/mounts:/ {
s|[^/]*//\([^:]*\).*|\1|
x
G
s/\n/ /p
}
$ sed -nf script.sed input_file
Linux server1.example.com