Home > database >  how to get substring from
how to get substring from

Time:11-15

how to get substring from

 42 45 47 49 4e 21 40 23 47 68 6a 6b 2c 47 68 6a BEGIN!@#Ghjk,Ghj 6b 45 4e 44 23 40 21 kEND#@!         

to be

BEGIN!@#Ghjk,GhjkEND#@!

Note: there is whitespaces at end of lines, I tried removing whitespaces at end of lines but I cant.

I tried

#!/bin/bash

s=$(awk '/BEGIN!@#/,/END#@!/' switch.log )


while IFS= read -r line 
do

  h=$(echo "$line" | awk '{$1=$1;print}')
  for i in {0..100}
  do

    zzz=$(echo "$h"  | awk '{print $(NF-$i)}')

    if [ ! -z "$zzz" -a "$zzz" != " " ]; then

      hh=$(echo "$h"  | awk  '{print $(NF-$i)}') 
      echo "$zzz"

      echo  -e  "$zzz" >> ggg.txt
      break
    fi

  done

done <<< "$s"

I got

BEGIN!@#Ghjk,Ghj

CodePudding user response:

Another option is using sed with the normal substitute method storing the text you want to keep as the first two backreferences. For example:

sed -E 's/^.*(BEGIN[^[:space:]] ).*(kEND[^[:space:]] )/\1\2/' <<< 'your string`

Example Use/Output

(note: updated to handle whitespace at the end)

$ sed -E 's/^.*(BEGIN[^[:space:]] ).*(kEND[^[:space:]] )/\1\2/' <<< '42 45 47 49 4e 21 40 23 47 68 6a 6b 2c 47 68 6a BEGIN!@#Ghjk,Ghj 6b 45 4e 44 23 40 21 kEND#@!'
BEGIN!@#Ghjk,GhjkEND#@!

(note: single-quoting the string is required due to '!')

CodePudding user response:

Using sed

$ sed -E 's/[0-9] [a-z]?  |  //g' input_file
BEGIN!@#Ghjk,GhjkEND#@!

CodePudding user response:

UPDATED, to fix an error: You have not defined precisely in your question, how the string to be extracted looks like in general, but based on your example, this would do:

if [[ $line =~ (BEGIN[^ ] )\ .*([^ ] END[^ ] ) ]]
then
  substring=${BASH_REMATCH[1]}${BASH_REMATCH[2]}
else
  echo Pattern not found in line 1>&2
fi

CodePudding user response:

I would harness GNU AWK for this task following way, let file.txt content be

 42 45 47 49 4e 21 40 23 47 68 6a 6b 2c 47 68 6a BEGIN!@#Ghjk,Ghj 6b 45 4e 44 23 40 21 kEND#@!        

then

awk 'BEGIN{FPAT="[^[:space:]]*(BEGIN|END)[^[:space:]]*";OFS=""}{$1=$1;print}' file.txt

gives output

BEGIN!@#Ghjk,GhjkEND#@!

Explanation: I inform GNU AWK using field pattern (FPAT) that field is BEGIN or (|) END, prefixed and suffixed by zero-or-more (*) non (^)-whitespace ([:space:]) characters and output field separator (OFS) is empty string, then for each line I do $1=$1 to trigger line rebuilt and print it. If you are sure only space characters are used in line you might elect to replace [^[:space:]] using [^ ]

(tested in gawk 4.2.1)

CodePudding user response:

The inherent logic of the transformation is unclear so you have a few options that will work for the sample input:

  • Remove all pairs of space-delimited hexadecimal digits and spaces
sed -nE -e 's/(^| )[[:xdigit:]]{2}( [[:xdigit:]]{2})*( |$)|  //g' \
        -e '/BEGIN!@#|END#@!/p'
  • print all the space-delimited substrings that contain BEGIN!@# or END#@!:
awk '
    {
        ok = 0
        for (i = 1; i <= NF; i  )
            if ($i ~ /BEGIN!@#|END#@!/) {
                printf "%s", $i
                ok = 1
            }
        if (ok)
            print ""
    }
'
  • extract the substrings delimited by BEGIN!@# and END#@! and remove the space delimited content between them:
awk '
    match($0,/BEGIN!@#.*END#@!/) {
        s = substr($0,RSTART,RLENGTH)
        sub(/ .* | /,"",s)
        print s
    }
'
  • Related