Home > Enterprise >  Find and replace thousands of files with a complex regular expression
Find and replace thousands of files with a complex regular expression

Time:08-11

I need to run a find and replace over thousands of files, matching on this expression

/typedef((?:\\w |\\s ) ?)(\\w );/

and replacing the match with this

#ifndef $2\ntypedef$1 $2\n#endif

The ultimate goal is to take a bunch of header files and make sure they don't have conflicting definitions. So it would replace lines like

typedef unsigned char       __uint8_t;

with

#ifndef __uint8_t
 typedef unsigned char      __uint8_t;
#endif

I have tried to use awk with gensub but I don't get the expect result

awk '{ 
print gensub(/typedef((?:\\w |\\s ) ?)(\\w );/, "#ifndef \2\ntypedef\1 \2\n#endif", "g", $0) 
}' myheader.h

If I try running the above command, the output prints unchanged. If I change the target from $0 to $1 I get

/*
*
*
*/

#ifdef
typedef
#else
typedef
#endif
typedef
typedef
typedef
typedef
typedef
typedef
typedef

typedef
typedef

/*
*
*

I'm not sure that awk is the right tool given the complexity of the expression. Is this something awk can handle and I'm just using it wrong, or is there a better approach?

CodePudding user response:

Not sure I understand the significance of the regex (eg, should all typedef entries be converted?) ...

Setup:

$ cat myheader.h
typedef unsigned char       __uint8_t;
leave this line alone
typedef unsigned int       __whatever;
leave this line alone
typedef some other stuff char       __pick_me;
leave this line alone

Assuming all typedef entries are to be converted, one awk idea:

awk '
/typedef/ { $0= "ifndef " $NF ORS " " $0 ORS "endif" }
1
' myheader.h

This generates:

#ifndef __uint8_t;
 typedef unsigned char       __uint8_t;
#endif
leave this line alone
#ifndef __whatever;
 typedef unsigned int       __whatever;
#endif
leave this line alone
#ifndef __pick_me;
 typedef some other stuff char       __pick_me;
#endif
leave this line alone

Once the result is verified, and if using GNU awk, you can replace awk with awk -i inplace to overwrite the original file(s).


One idea for extending this answer to keep a re-run of the script from generating duplicate #ifndef/#endif pairs, and assuming the #ifndef and typedef are on consecutive lines:

awk '
/^#ifndef/       { defname=$NF; defline=FNR }
FNR==(defline 1) { defname=$NF }
/typedef/        { if (defname != $NF) $0= "#ifndef " $NF ORS " " $0 ORS "#endif" }
1
' myheader.h

CodePudding user response:

You can try using GNU awk

$ awk '{match($0,/(typedef[^_]*(\w );)/,a); print "#ifndef "a[2]"\n "a[1]"\n#endif"}' input_file

Or using GNU sed

$ sed -E 's/typedef([^_]*)((\w );)/#ifndef \3\n &\n#endif/' input_file

Output

#ifndef __uint8_t
 typedef unsigned char       __uint8_t;
#endif
  • Related