Replace the numeric value-CodePudding

How do we mask the numeric data if the file is in excel format/csv

Input.xlsx/Input.csv 
Col1,Col2,Col3
A,B,-98.02
M,N,1003
P,Q,-1025.345

Output.xlsx/output.csv 
Col1,Col2,Col3
A,B,-78.14
M,N,1143
P,Q,-1245.745

I want to mask each numeric value to different value dynamically. Instead of original data it has to mask with dummy number in shell

CodePudding user response：

This works with CSV files, not with xlsx files:

gawk -F, 'BEGIN{ OFS=","}
          { for(i=1;i<=NF;i  ){ 
               x=$i; 
               gsub(/[0-9.-]*/,"",x); 
               if(x==""){ $i=rand()*100 }
          } print $0 }'  Input.csv

With above script every numeric value is replaced with a numeric value between 0 and 100 (with decimals)

output (example):

Col1,Col2,Col3
A,B,92.4046
M,N,59.3909
P,Q,30.6394

You can replace the rand()*100 with (almost) anything to get another range.

for(i=1;i<=NF;i ) loop over all field in the current line
gsub(/[0-9.-]*/,"",x) Will replace every number (0-9), or . or - with an empty string (""), effectively leaving an empty string when it's a number.
if(x=="") When x is empty replace the field with a random value.

NOTE: The final piece } print $0 }' can also be written as }}1', but I think using the first option is more clear.

EDIT: Keeping negative numbers negative, with (about) the same number of decimals

Create an awk script (i.e. replace.awk):

function rep(s) {
        sgn = s<0?-1:1
        nbc = int(log(sgn*s)/log(10)) 1
        dec = (s-int(s)==0)?0:length(sgn*s-int(sgn*s))-2
        #print "DEBUG" sgn, nbc, dec, "%"nbc"."dec"f"
        return sprintf("%"nbc"."dec"f",sgn*rand()*(10**nbc))
        }

BEGIN{ OFS=","
        srand()
}
{ for(i=1;i<=NF;i  ){
               x=$i;
               gsub(/[0-9.-]*/,"",x);
              if(x==""){ $i=rep($i) }
  }
}1

and do: gawk -F, -f replace.awk Input.csv

output should look like this (example):

Col1,Col2,Col3
A,B,-49.73
M,N,5577
P,Q,-551.278

The introduced variables with this solution:

sgn The sign of the number ( 1 or -1)
nbc Numbers Before Comma (Comma ==> meant is Decimal separator)
dec Number of digits after Comma (Comma ==> meant is Decimal separator)