Replace only alphanumeric chars from strings in one file in another-CodePudding

I have file1 with records that I want to find and replace with # in file2 and redirect the output to file3. I want to translate only the alphanumeric characters in file2. With the below code I'm not able to get the expected output. What am I doing wrong?

file_read=`cat file2`
while read line; do
  var=`echo $line | tr '[a-zA-Z0-9]' '#'`
  rep=`echo $file_read | awk "{gsub(/$line/,\"$var\"); print}"`
done < file1
echo file2 > file3

cat file1

2001009
@vanti Finserv Co.
2001009
Fund #1
11:11 - Capital
MS&CO(NY)
American Friends Org, Inc. 12X32
Domain-Name (LLC)
MS&CO(NY)
MS&CO(NY)
Ivy/Estate Rd
E*Trade wholesale

cat file2

<html>
<body>
<hr><br><>span >Records</span><table>
<tr >
 <td>Rec1</td>
 <td>Rec2</td>
 <td>Rec3</td>
 <td>Rec4</td>
 <td>Rec5</td>
 <td>Rec6</td>
 <td>Rec7</td>
 <td>Rec8</td>
</tr>
<tr >
<td>@vanti Finserv Co.</td>
<td>11:11 - Capital</td>
<td>MS&CO(NY)</td>
<td>New York</td>
<td>CDX98XSD</td>
<td>E*Trade wholesale</td>
<td>Domain-Name (LLC)</td>
<td>Ivy/Estate Rd</td>
<td></td>
</tr>
<tr >
<td>@vanti Finserv Co.</td>
<td></td>
<td>MS&CO(NY)</td>
<td>2</td>
<td>2</td>
<td>MS&CO(NY)</td>
<td>MS&CO(NY)</td>
<td>Ivy/Estate Rd</td>
</table>
</body>
</html>

expected output cat file3

<html>
<body>
<hr><br><>span >Records</span><table>
<tr >
 <td>Rec1</td>
 <td>Rec2</td>
 <td>Rec3</td>
 <td>Rec4</td>
 <td>Rec5</td>
 <td>Rec6</td>
 <td>Rec7</td>
 <td>Rec8</td>
</tr>
<tr >
<td>@##### ####### ##.</td>
<td>##:## - #######</td>
<td>##&##(##)</td>
<td>New York</td>
<td>CDX98XSD</td>
<td>#*##### ########</td>
<td>######-#### (###)</td>
<td>###/###### ##</td>
<td></td>
</tr>
<tr >
<td>@##### ####### ##.</td>
<td></td>
<td>##&##(##)</td>
<td>2</td>
<td>2</td>
<td>##&##(##)</td>
<td>##&##(##)</td>
<td>###/###### ##/td>
</table>
</body>
</html>

CodePudding user response：

You seem to be looking for something like

awk 'NR==FNR {
  regex = $0;
  gsub(/[][(){}|\\* ?.^$]/, "\\\\&", regex);
  a[  n] = regex;

  gsub(/[A-Za-z0-9]/, "#");
  gsub(/&/, "\\\\&");
  b[n] = $0;

  next
}
{ for(i=1;i<=n;  i)
    gsub(a[i], b[i])
} 1' file1 file2 >file3

In brief, we populate the array a with the phrases from file1, and b with the corresponding replacement strings. The condition FNR==NR will be true for the first input file; we then fall through to the rest of the script, which simply replaces any strings from a with the corresponding string from b, and prints all the lines.

The code is complicated somewhat by the escaping of regex metacharacters in a and further by the fact that & in the replacement string needs to be escaped, too (& alone recalls the matched text).

Demo: https://ideone.com/YkAkAZ

You generally want to avoid while read loops in the shell; Awk is much faster and more idiomatic when you want to perform some transformation on all lines in a file.

As a further aside, please try http://shellcheck.net/ before asking for human assistance. Even after you fixed syntax errors pointed out in comments, your attempt contains common beginner errors such as broken quoting.

CodePudding user response：

Would you please try the following:

awk '
    NR==FNR {s = $0; gsub("[[:alnum:]]", "#"); a[s] = $0; next}
    {
        if (match($0, ">[^<] ")) {
            str = substr($0, RSTART 1, RLENGTH-1)
            if (str in a) {
                $0 = substr($0, 1, RSTART) a[str] substr($0, RSTART RLENGTH)
            }
        }
    }
1 ' file1 file2 > file3

It assumes the strings to be replced are enclosed with tags but will work with the shown example.