Assume I have a file input.txt
with the following contents:
Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: replace-with-code
Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: replace-with-code
I want to create an output.txt
which looks like this:
Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: 1234
Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: 1122
The proposed way from here: Copy contents from capture group to another subsequent line :
awk '/[0-9]{4}./ {match($0,"([0-9]{4})",n);}{gsub(/replace-with-code/,n[0]); print}' inputfile > outputfile
Returns an error. I am just not able to fix this issue... Any awk magicians can help me here?
CodePudding user response:
Here is an awk
solution:
awk '{gsub(/replace-with-code/, p)} /[0-9]{4}\.$/ {p = $NF 0} 1' file
Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: 1234
Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: 1122
CodePudding user response:
In case you are not worried about blank lines in between actual data lines then try following GNU awk
program. Basically its using GNU awk
's match
function in which we could use regex((^|\n)([^:]*: )([0-9] )(\.\n[^\n] \n[^:]*: )
) and can catch values as per our requirement and while printing them we can print them as per required order.
awk -v RS= '
match($0,/(^|\n)([^:]*: )([0-9] )(\.\n[^\n] \n[^:]*: )/,arr){
print arr[1] arr[2] arr[3] arr[4] arr[3]
}
' Input_file
Explanation: Adding detailed explanation for used regex in above awk
program.
(^|\n) ##In first capturing group using regex ^|\n
([^:]*: ) ##In next one matching everything till colon space first occurrence.
([0-9] ) ##In this capturing group matching 1 or more digits.
(\.\n[^\n] \n[^:]*: ) ##In 4th capturing group matching literal dot followed by new line followed by
##non-new lines followed by new line till very first occurrence of colon followed by space.
CodePudding user response:
I would exploit GNU AWK
paragraph mode for this task following way, let file.txt
content be
Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: replace-with-code
Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: replace-with-code
then
awk BEGIN{RS="";ORS="\n\n"}{print gensub(/([0-9] )(.*)replace-with-code/, "\\1\\2\\1", 1)}' file.txt
gives output
Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: 1234
Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: 1122
Explanation: setting RS
to empty string provokes GNU AWK
that rows are separated by blank lines rather than newlines. I set ORS
to preserve such lines. Then for every row I use gensub
function, to capture number which I simply define as 1 or more digits and what is between said number and replace-with-code
, last I do not capture as it will be not kept. Then replace with what was captured in desired output. Disclaimer: this solution that you never has more than 1 subsequent blank lines.
(tested in gawk 4.2.1)
CodePudding user response:
skip all the gensub()/gsub()
, no match()
needed, no arrays needed, no capture groups needed … and just make it a basic if-then-else
:
echo "${input_data….}" | mawk 'NF<=!__ || $NF =_=/replace-with-code$/ ? _ : $NF' FS=': ' OFS=': '
Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: 1234
Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: 1122