I want to remove all text after certian format.
<JOB APPLICATION="Daily" SUB_APPLICATION="Y#D5#4#M2F" JOBNAME="MLETTXXD-NONR_005" DESCRIPTION="" CREATED_BY="vpma" RUN_AS="ctmagt" CRITICAL="0" TASKTYPE="Dummy" NODEID="OPENFRAME" %%ENVIRONMENT MLETTXXD %%ORDERID %%RUNCOUNT %%JCL_STEP" CONFIRM="0" RETRO="0" MAXRERUN="0" AUTOARCH="1" MAXDAYS="0" MAXRUNS="0" TIMETO=">" JAN="1" FEB="1" MAR="1"
<INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
<INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />
Delete all string before and after JOBNAME="..."
Output should be
JOBNAME="MLETTXXD-NONR_005"
<INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
<INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />
I tried below but not happening for second awk
condition.
awk '/JOBNAME=/{print $4} | /INCOND/{print $2}' inputfile.txt
CodePudding user response:
Using sed
$ sed s'/.*\(JOBNAME[^ ]*\).*/\1/' input_file
JOBNAME="MLETTXXD-NONR_005"
<INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
<INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />
CodePudding user response:
One simple fix to OP's current awk
code:
$ awk '/JOBNAME=/{$0=$4}1' inputfile.txt
JOBNAME="MLETTXXD-NONR_005"
<INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
<INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />
NOTES:
$0=$4
says to replace the current line with the contents of the 4th field- assumes OP's
/INCOND/
pattern match is an attempt to print the rest of the lines of input hence ... - the standalone
1
says to print the current line
This has a few limitations:
- assumes the
JOBNAME="..."
string is always in the 4th space-delimited field of a line - does not take into consideration multiple instances of the string in a single line
- assumes the string does not contain any white space
Addressing the limitations ...
First we'll add a new line to the input:
$ cat inputfile.txt
<JOB APPLICATION="Daily" SUB_APPLICATION="Y#D5#4#M2F" JOBNAME="MLETTXXD-NONR_005" DESCRIPTION="" CREATED_BY="vpma" RUN_AS="ctmagt" CRITICAL="0" TASKTYPE="Dummy" NODEID="OPENFRAME" %%ENVIRONMENT MLETTXXD %%ORDERID %%RUNCOUNT %%JCL_STEP" CONFIRM="0" RETRO="0" MAXRERUN="0" AUTOARCH="1" MAXDAYS="0" MAXRUNS="0" TIMETO=">" JAN="1" FEB="1" MAR="1"
<INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
<INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />
<JOB APPLICATION="Daily" JOBNAME="JOBNAME # 1" DESCRIPTION="" JOBNAME="Another Job" CREATED_BY="vpma"
A GNU awk
idea:
awk '
BEGIN { FPAT="\\<JOBNAME=\"[^\"]*\"" } # define field pattern as JOBNAME="..."
NF { pfx="" # if we have a FPAT match then NF>0
for (i=1;i<=NF;i ) { # loop through our FPAT matches
printf "%s%s",pfx,$i # print each FPAT match to stdout
pfx=OFS
}
print "" # terminate the line of FPAT matches
next # go to next line of input
}
1 # print all lines that do not have a FPAT match
' inputfile.txt
NOTE:
GNU awk
is needed forFPAT
support (this allows us to define the format of the field; this replaces the use ofFS
which defines the format of the field delimiter)- standalone
1
assumes OP wants to print all other lines of input that don't have a match to the stringJOBNAME="..."
(otherwise OP should update the sample input to contain lines that should not be printed)
This generates:
JOBNAME="MLETTXXD-NONR_005"
<INCOND NAME="PROD-A#D5#4#M2F-STRTDAYA-001-OK" ODATE="ODAT" AND_OR="A" />
<INCOND NAME="PROD-PS#P#D3#SU2SA@E-TIME0000-098-OK" ODATE="ODAT" AND_OR="A" />
JOBNAME="JOBNAME # 1" JOBNAME="Another Job"
CodePudding user response:
Use this Perl one-liner:
perl -pe 's{ .* ( JOBNAME="[^"]*" ) .* }{$1}x;' in_file > out_file
The Perl one-liner uses these command line flags:
-e
: Tells Perl to look for code in-line, instead of in a file.
-p
: Loop over the input one line at a time, assigning it to $_
by default. Add print $_
after each loop iteration.
The regex uses these modifiers:
/x
: Ignore whitespace and comments, for readability.
s{ .* ( JOBNAME="[^"]*" ) .* }{$1};
: Replace this pattern: .*
- any character repeater 0 or more times, followed by JOBNAME="[^"]*"
, which has [^"]*
- any character except "
, repeated 0 or more times, followed by .*
. Replace this pattern with $1
: the first capture group, that is whatever was matched inside the parentheses.
SEE ALSO:
perldoc perlrun
: how to execute the Perl interpreter: command line switches
perldoc perlre
: Perl regular expressions (regexes)
perldoc perlre
: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick
: Perl regular expressions quick start