Home > Blockchain >  Copy the string that matches a regex to another location using awk
Copy the string that matches a regex to another location using awk

Time:07-30

Assume I have a file input.txt with the following contents:

Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: replace-with-code

Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: replace-with-code

I want to create an output.txt which looks like this:

Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: 1234

Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: 1122

The proposed way from here: Copy contents from capture group to another subsequent line :

awk '/[0-9]{4}./ {match($0,"([0-9]{4})",n);}{gsub(/replace-with-code/,n[0]); print}' inputfile > outputfile

Returns an error. I am just not able to fix this issue... Any awk magicians can help me here?

CodePudding user response:

Here is an awk solution:

awk '{gsub(/replace-with-code/, p)} /[0-9]{4}\.$/ {p = $NF 0} 1' file

Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: 1234

Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: 1122

CodePudding user response:

In case you are not worried about blank lines in between actual data lines then try following GNU awk program. Basically its using GNU awk's match function in which we could use regex((^|\n)([^:]*: )([0-9] )(\.\n[^\n] \n[^:]*: )) and can catch values as per our requirement and while printing them we can print them as per required order.

awk -v RS= '
match($0,/(^|\n)([^:]*: )([0-9] )(\.\n[^\n] \n[^:]*: )/,arr){
  print arr[1] arr[2] arr[3] arr[4] arr[3]
}
'  Input_file

Explanation: Adding detailed explanation for used regex in above awk program.

(^|\n)                ##In first capturing group using regex ^|\n
([^:]*: )             ##In next one matching everything till colon space first occurrence.
([0-9] )              ##In this capturing group matching 1 or more digits.
(\.\n[^\n] \n[^:]*: ) ##In 4th capturing group matching literal dot followed by new line followed by
                      ##non-new lines followed by new line till very first occurrence of colon followed by space.

CodePudding user response:

I would exploit GNU AWK paragraph mode for this task following way, let file.txt content be

Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: replace-with-code

Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: replace-with-code

then

awk BEGIN{RS="";ORS="\n\n"}{print gensub(/([0-9] )(.*)replace-with-code/, "\\1\\2\\1", 1)}' file.txt

gives output

Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: 1234

Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: 1122

Explanation: setting RS to empty string provokes GNU AWK that rows are separated by blank lines rather than newlines. I set ORS to preserve such lines. Then for every row I use gensub function, to capture number which I simply define as 1 or more digits and what is between said number and replace-with-code, last I do not capture as it will be not kept. Then replace with what was captured in desired output. Disclaimer: this solution that you never has more than 1 subsequent blank lines.

(tested in gawk 4.2.1)

CodePudding user response:

skip all the gensub()/gsub(), no match() needed, no arrays needed, no capture groups needed … and just make it a basic if-then-else :

echo "${input_data….}" |

mawk 'NF<=!__ || $NF =_=/replace-with-code$/ ?  _ : $NF' FS=': ' OFS=': '
Hello my name is: 1234.
My favorite color is blue.
This was my code from the introduction line: 1234

Hello my name is: 1122.
My favorite color is blue.
This was my code from the introduction line: 1122
  • Related