I am in the situation where I have a text file and I need to find the string that matches a specific pattern and modify it twice to have two different outputs, something like the following:
let's say this is the original file:
ID_00:/hdfs_01 name1 ID_01:/hdfs_02 name2 ID_02:/hdfs_03 name3 ID_03:/hdfs_app_data_01 name4 ID_04:/hdfs_app_data_02 name5 ID_05:/hdfs_cmmd_prt name6
you will find that it is similar to the structure of an fstab.
And the expected result is :
ID_00:/hdfs_01 name1_old ID_06:/hdfs_01 name1 ID_01:/hdfs_02 name2_old ID_07:/hdfs_02 name2 ID_02:/hdfs_03 name3_old ID_08:/hdfs_03 name3 ID_03:/hdfs_app_data_01 name4_old ID_09:/hdfs_app_data_01 name4 ID_04:/hdfs_app_data_02 name5_old ID_10:/hdfs_app_data_02 name5 ID_05:/hdfs_cmmd_prt name6_old ID_11:/hdfs_cmmd_prt name6
So far, I've considered doing a simple sed search for hdfs
as is the only thing in common, add it to a different text file, modify it, and then adding the entire modified block back to the source file.
Do this twice, once to add the old
suffix and once to modify the ID
.
Something like:
sed -n '/hdfs/p' >> new.file
Add the old
suffix in the new.file with sed/awk
save the result in a different file.
Now do it again to update the ID and save the result in a different file.
Then with those 2 files created with the expected output I can easily remove the record on the current file, add the new ones and reload the app.
this is more or less the plan so far.
But I would like to know if this could be achieved using only sed and its buffer.
Saving the original matched pattern, modify it once for the old
suffix and print it [this is output #1] and then modify the original line again for the ID
and print it again [this is output #2 ], then we can remove the original line.
CodePudding user response:
Assumptions:
- lines of interest start with
ID_##:
and have no trailing white space (all other lines will be printed as is without any modifications) - the
##
is a zero-padded, 2-digit number - the number of lines starting with
ID_##:
could vary and is not known beforehand so ... - we keep a count (
c
) of the number of lines starting withID_##:
- for new lines we add
c
to##
to generate a newID_##:
label (eg,## = 02
andc = 6
so new label will beID_08:
)
I don't know sed
well enough to know if a running count can be maintained and then used to 'add' to a string in the pattern space (assuming the objective is to use a single sed
script) so since the question is also tagged with awk
...
One (verbose) awk
idea:
awk '
{ lines[NR]=$0
if ($0 ~ /^ID_/) c
}
END { for (i=1;i<=NR;i ) {
if (lines[i] ~ /^ID/) {
print lines[i] "_old" # assumes line has no trailing white space otherwise we end up with "<space>_old"
split(lines[i],a,":")
split(a[1],b,"_")
newid=sprintf("d",b[2] c)
print "ID_" newid ":" a[2]
}
else
print lines[i]
}
}
' old.file
This generates:
ID_00:/hdfs_01 name1_old
ID_06:/hdfs_01 name1
ID_01:/hdfs_02 name2_old
ID_07:/hdfs_02 name2
ID_02:/hdfs_03 name3_old
ID_08:/hdfs_03 name3
ID_03:/hdfs_app_data_01 name4_old
ID_09:/hdfs_app_data_01 name4
ID_04:/hdfs_app_data_02 name5_old
ID_10:/hdfs_app_data_02 name5
ID_05:/hdfs_cmmd_prt name6_old
ID_11:/hdfs_cmmd_prt name6
NOTE: I don't understand OP's last part of the question: then we can remove the original line.
; for now I've matched what OP listed as the expected output; if the intent is to further edit the result then we'll need the question updated to show the ultimate expected output
CodePudding user response:
The following 1 liner works with a couple of caveats:
- blank lines are omitted
- the number in the second line is not zero padded
awk -F":" '/hdfs/{ split($1, a, /_/); print $0 "\n" "ID_" a[2] 5 ":" $2 "_old" }' file