Home > OS >  Using SED to modify the same string twice and delete it
Using SED to modify the same string twice and delete it

Time:09-07

I am in the situation where I have a text file and I need to find the string that matches a specific pattern and modify it twice to have two different outputs, something like the following:

let's say this is the original file:

ID_00:/hdfs_01              name1

ID_01:/hdfs_02              name2

ID_02:/hdfs_03              name3

ID_03:/hdfs_app_data_01     name4

ID_04:/hdfs_app_data_02     name5

ID_05:/hdfs_cmmd_prt        name6

you will find that it is similar to the structure of an fstab.

And the expected result is :

ID_00:/hdfs_01              name1_old
ID_06:/hdfs_01              name1

ID_01:/hdfs_02              name2_old
ID_07:/hdfs_02              name2

ID_02:/hdfs_03              name3_old
ID_08:/hdfs_03              name3

ID_03:/hdfs_app_data_01     name4_old
ID_09:/hdfs_app_data_01     name4

ID_04:/hdfs_app_data_02     name5_old
ID_10:/hdfs_app_data_02     name5

ID_05:/hdfs_cmmd_prt        name6_old
ID_11:/hdfs_cmmd_prt        name6

So far, I've considered doing a simple sed search for hdfs as is the only thing in common, add it to a different text file, modify it, and then adding the entire modified block back to the source file.

Do this twice, once to add the old suffix and once to modify the ID.

Something like:

sed -n '/hdfs/p' >> new.file

Add the old suffix in the new.file with sed/awk save the result in a different file.

Now do it again to update the ID and save the result in a different file.

Then with those 2 files created with the expected output I can easily remove the record on the current file, add the new ones and reload the app.

this is more or less the plan so far.

But I would like to know if this could be achieved using only sed and its buffer. Saving the original matched pattern, modify it once for the old suffix and print it [this is output #1] and then modify the original line again for the ID and print it again [this is output #2 ], then we can remove the original line.

CodePudding user response:

Assumptions:

  • lines of interest start with ID_##: and have no trailing white space (all other lines will be printed as is without any modifications)
  • the ## is a zero-padded, 2-digit number
  • the number of lines starting with ID_##: could vary and is not known beforehand so ...
  • we keep a count (c) of the number of lines starting with ID_##:
  • for new lines we add c to ## to generate a new ID_##: label (eg, ## = 02 and c = 6 so new label will be ID_08:)

I don't know sed well enough to know if a running count can be maintained and then used to 'add' to a string in the pattern space (assuming the objective is to use a single sed script) so since the question is also tagged with awk ...

One (verbose) awk idea:

awk '
    { lines[NR]=$0
      if ($0 ~ /^ID_/) c  
    }
END { for (i=1;i<=NR;i  ) {
          if (lines[i] ~ /^ID/) {
             print lines[i] "_old"                 # assumes line has no trailing white space otherwise we end up with "<space>_old"
             split(lines[i],a,":")
             split(a[1],b,"_")
             newid=sprintf("d",b[2] c)
             print "ID_" newid ":" a[2]
          }
          else
             print lines[i]
      }
    }
' old.file

This generates:

ID_00:/hdfs_01              name1_old
ID_06:/hdfs_01              name1

ID_01:/hdfs_02              name2_old
ID_07:/hdfs_02              name2

ID_02:/hdfs_03              name3_old
ID_08:/hdfs_03              name3

ID_03:/hdfs_app_data_01     name4_old
ID_09:/hdfs_app_data_01     name4

ID_04:/hdfs_app_data_02     name5_old
ID_10:/hdfs_app_data_02     name5

ID_05:/hdfs_cmmd_prt        name6_old
ID_11:/hdfs_cmmd_prt        name6

NOTE: I don't understand OP's last part of the question: then we can remove the original line.; for now I've matched what OP listed as the expected output; if the intent is to further edit the result then we'll need the question updated to show the ultimate expected output

CodePudding user response:

The following 1 liner works with a couple of caveats:

  • blank lines are omitted
  • the number in the second line is not zero padded
awk -F":" '/hdfs/{ split($1, a, /_/); print $0 "\n" "ID_" a[2]   5 ":" $2 "_old" }' file
  • Related