Home > Blockchain >  Search for string and then print a word (defined by spaces only) after it
Search for string and then print a word (defined by spaces only) after it

Time:03-10

I got a file that contains i.e:

Diameter is defined by... diameter 26.5/9 diameter has been changed
diameter has changed and then DIAMETER 942/10 yada di yada Diameter

In output i want:

26.5/9
942/10

As far as I understand:

I need to search for case insensitive string "diameter" that is followed by space and numeric character than go back to "diameter" and print following string between spaces.

I've only managed to came up with something like this:

sed -n '/^.*diameter\s\ \(\w\ \).*$/s//\1/p'

but it only prints single word (separated by non-alphanumeric) following a first occurence of case sensitive word "diameter". Output would look like this:

26
has

CodePudding user response:

1st solution: Using match function of awk written and tested in GNU awk. Using GNU awk's IGNORECASE option so that it can match both diameter and DIAMETER in regex. Using awk's match function to match regex diameter [0-9] (\.[0-9] )?\/[0-9] which will match diameter followed by space followed by digits followed by optional dot and digits followed by digits to get diameter values.

awk -v IGNORECASE="1" '
match($0,/diameter [0-9] (\.[0-9] )?\/[0-9] /){
  print substr($0,RSTART 9,RLENGTH-9)
}
' Input_file


2nd solution: With your shown samples, please try following awk code. Using awk's RS capability here, mentioning regex(same explained in above 1st solution) and in main program in RT value splitting it into array arr and printing 2nd part of arr which is diameter value.

awk -v RS='(diameter|DIAMETER) [0-9] (\\.[0-9] )?\\/[0-9] ' '
RT{
  split(RT,arr)
  print arr[2]
}
' Input_file


3rd solution: Using sed written and tested in GNU sed. Using -E option of sed to enable ERE(extended regular expressions) here. Then in main program matching everything till diameter OR DIAMETER from starting of value, followed by spaces followed by same regex mentioned above in a capturing group(a feature in sed to save values into temp buffer to be used later on) and substituting whole line's value with captured part only.

sed -E 's/.*(diameter|DIAMETER)\s ([0-9] (\.[0-9] )?\/[0-9] ).*/\2/' Input_file


4th solution: With GNU grep using regex try following. Using grep's -o and -P options to print only matched part and enable PCRE regex respectively. In main program doing a lazy match from starting of value till diameter OR DIAMETER string followed by spaces, then using \K option to FORGET all mathed values till here. Which is followed by regex which will print the required value by OP.

grep -oP '.*?(?:diameter|DIAMETER)\s \K[0-9] (\.[0-9] )?\/[0-9] ' Input_file

Output will be as follows:

26.5/9
942/10

CodePudding user response:

With GNU sed for i to make the match case-insensitive:

$ sed -n 's/.*diameter \([0-9][^ ]*\).*/\1/pI' file
26.5/9
942/10

With any sed tr:

tr 'A-Z' 'a-z' file | sed -n 's/.*diameter \([0-9][^ ]*\).*/\1/p'

CodePudding user response:

You can also use grep:

$ grep -Pio '(?<=diameter\s)[\d.\/] (?=\s)' input_file
26.5/9 
942/10

  • (?<=): Positive lookbehind.
  • diameter\s: Matches diameter .
  • [\d.\/]: Matches either a digit, a dot or a slash.
  • : Matches the preceding token between one and unlimited times (greedy).
  • (?=): Positive lookahead.
  • \s: Matches a white space.

  • -P: Perl style regex.
  • -i: Case insensitive.
  • -o: Print only matching part.
  • Related