Home > OS >  Remove everything after certain character and length
Remove everything after certain character and length

Time:05-26

I want to remove all text after certian format.

<JOB APPLICATION="Daily" SUB_APPLICATION="Y#D5#4#M2F" JOBNAME="MLETTXXD-NONR_005" DESCRIPTION="" CREATED_BY="vpma" RUN_AS="ctmagt" CRITICAL="0" TASKTYPE="Dummy" NODEID="OPENFRAME"  %%ENVIRONMENT MLETTXXD %%ORDERID %%RUNCOUNT %%JCL_STEP" CONFIRM="0" RETRO="0" MAXRERUN="0" AUTOARCH="1" MAXDAYS="0" MAXRUNS="0"  TIMETO="&gt;" JAN="1" FEB="1" MAR="1" 
                <INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
                <INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />

Delete all string before and after JOBNAME="..."

Output should be

JOBNAME="MLETTXXD-NONR_005"
                <INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
                <INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />

I tried below but not happening for second awk condition.

awk '/JOBNAME=/{print $4} | /INCOND/{print $2}' inputfile.txt

CodePudding user response:

Using sed

$ sed s'/.*\(JOBNAME[^ ]*\).*/\1/' input_file
JOBNAME="MLETTXXD-NONR_005"
                <INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
                <INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />

CodePudding user response:

One simple fix to OP's current awk code:

$ awk '/JOBNAME=/{$0=$4}1' inputfile.txt
JOBNAME="MLETTXXD-NONR_005"
                <INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
                <INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />

NOTES:

  • $0=$4 says to replace the current line with the contents of the 4th field
  • assumes OP's /INCOND/ pattern match is an attempt to print the rest of the lines of input hence ...
  • the standalone 1 says to print the current line

This has a few limitations:

  • assumes the JOBNAME="..." string is always in the 4th space-delimited field of a line
  • does not take into consideration multiple instances of the string in a single line
  • assumes the string does not contain any white space

Addressing the limitations ...

First we'll add a new line to the input:

$ cat inputfile.txt
<JOB APPLICATION="Daily" SUB_APPLICATION="Y#D5#4#M2F" JOBNAME="MLETTXXD-NONR_005" DESCRIPTION="" CREATED_BY="vpma" RUN_AS="ctmagt" CRITICAL="0" TASKTYPE="Dummy" NODEID="OPENFRAME"  %%ENVIRONMENT MLETTXXD %%ORDERID %%RUNCOUNT %%JCL_STEP" CONFIRM="0" RETRO="0" MAXRERUN="0" AUTOARCH="1" MAXDAYS="0" MAXRUNS="0"  TIMETO="&gt;" JAN="1" FEB="1" MAR="1"
                <INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
                <INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />
<JOB APPLICATION="Daily" JOBNAME="JOBNAME # 1" DESCRIPTION="" JOBNAME="Another Job" CREATED_BY="vpma"

A GNU awk idea:

awk '
BEGIN { FPAT="\\<JOBNAME=\"[^\"]*\"" }    # define field pattern as JOBNAME="..."
NF    { pfx=""                            # if we have a FPAT match then NF>0
        for (i=1;i<=NF;i  ) {             # loop through our FPAT matches
            printf "%s%s",pfx,$i          # print each FPAT match to stdout
            pfx=OFS
        }
        print ""                          # terminate the line of FPAT matches
        next                              # go to next line of input
      }
1                                         # print all lines that do not have a FPAT match
' inputfile.txt

NOTE:

  • GNU awk is needed for FPAT support (this allows us to define the format of the field; this replaces the use of FS which defines the format of the field delimiter)
  • standalone 1 assumes OP wants to print all other lines of input that don't have a match to the string JOBNAME="..." (otherwise OP should update the sample input to contain lines that should not be printed)

This generates:

JOBNAME="MLETTXXD-NONR_005"
                <INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
                <INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />
JOBNAME="JOBNAME # 1" JOBNAME="Another Job"

CodePudding user response:

Use this Perl one-liner:

perl -pe 's{ .* ( JOBNAME="[^"]*" ) .* }{$1}x;' in_file > out_file

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.

The regex uses these modifiers:
/x : Ignore whitespace and comments, for readability.

s{ .* ( JOBNAME="[^"]*" ) .* }{$1}; : Replace this pattern: .* - any character repeater 0 or more times, followed by JOBNAME="[^"]*", which has [^"]* - any character except ", repeated 0 or more times, followed by .*. Replace this pattern with $1: the first capture group, that is whatever was matched inside the parentheses.

SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start

  • Related