I got a file that contains i.e:
Diameter is defined by... diameter 26.5/9 diameter has been changed
diameter has changed and then DIAMETER 942/10 yada di yada Diameter
In output i want:
26.5/9
942/10
As far as I understand:
I need to search for case insensitive string "diameter" that is followed by space and numeric character than go back to "diameter" and print following string between spaces.
I've only managed to came up with something like this:
sed -n '/^.*diameter\s\ \(\w\ \).*$/s//\1/p'
but it only prints single word (separated by non-alphanumeric) following a first occurence of case sensitive word "diameter". Output would look like this:
26
has
CodePudding user response:
1st solution: Using match
function of awk
written and tested in GNU awk
. Using GNU awk
's IGNORECASE
option so that it can match both diameter
and DIAMETER
in regex. Using awk
's match
function to match regex diameter [0-9] (\.[0-9] )?\/[0-9]
which will match diameter followed by space followed by digits followed by optional dot and digits followed by digits to get diameter values.
awk -v IGNORECASE="1" '
match($0,/diameter [0-9] (\.[0-9] )?\/[0-9] /){
print substr($0,RSTART 9,RLENGTH-9)
}
' Input_file
2nd solution: With your shown samples, please try following awk
code. Using awk
's RS
capability here, mentioning regex(same explained in above 1st solution) and in main program in RT value splitting it into array arr and printing 2nd part of arr which is diameter value.
awk -v RS='(diameter|DIAMETER) [0-9] (\\.[0-9] )?\\/[0-9] ' '
RT{
split(RT,arr)
print arr[2]
}
' Input_file
3rd solution: Using sed
written and tested in GNU sed
. Using -E
option of sed
to enable ERE(extended regular expressions) here. Then in main program matching everything till diameter
OR DIAMETER
from starting of value, followed by spaces followed by same regex mentioned above in a capturing group(a feature in sed
to save values into temp buffer to be used later on) and substituting whole line's value with captured part only.
sed -E 's/.*(diameter|DIAMETER)\s ([0-9] (\.[0-9] )?\/[0-9] ).*/\2/' Input_file
4th solution: With GNU grep
using regex try following. Using grep
's -o
and -P
options to print only matched part and enable PCRE regex respectively. In main program doing a lazy match from starting of value till diameter
OR DIAMETER
string followed by spaces, then using \K
option to FORGET all mathed values till here. Which is followed by regex which will print the required value by OP.
grep -oP '.*?(?:diameter|DIAMETER)\s \K[0-9] (\.[0-9] )?\/[0-9] ' Input_file
Output will be as follows:
26.5/9
942/10
CodePudding user response:
With GNU sed for i
to make the match case-insensitive:
$ sed -n 's/.*diameter \([0-9][^ ]*\).*/\1/pI' file
26.5/9
942/10
With any sed tr:
tr 'A-Z' 'a-z' file | sed -n 's/.*diameter \([0-9][^ ]*\).*/\1/p'
CodePudding user response:
You can also use grep:
$ grep -Pio '(?<=diameter\s)[\d.\/] (?=\s)' input_file
26.5/9
942/10
(?<=)
: Positive lookbehind.diameter\s
: Matchesdiameter
.[\d.\/]
: Matches either a digit, a dot or a slash.(?=)
: Positive lookahead.\s
: Matches a white space.
-P
: Perl style regex.-i
: Case insensitive.-o
: Print only matching part.