Home > Mobile >  replace the number after # using cutoff and modify the column in multicolumn file
replace the number after # using cutoff and modify the column in multicolumn file

Time:10-29

I have file with multiple columns. each column onwards 4th column,has 2 parts a part before # and a part after #. If the number after # is >20, then I want to replace the # and the followed the number with null like 0|0#99 is becoming 0|0 as 99 > 20. If number followed by # is <20, then I want to replace the entire cell value with "./." like 0|0#14 is becoming "./.". If there isa dot after #, then it the value as it is like 0|0#. will be 0|0#. as it it.

input_file.txt. tab separated file I have

1   12345   A   T   0|0#.   0|0#.   0|0#14  0|0#.   0|0#.   0|0#20  0|0#15  0|0#40  0|0#99      
1   78906   C   T   0|0#99  0|0#.   0|0#10  0|0#.   0|0#45  0|0#20  0|0#95  0|0#78  0|0#99      

Output > 20

1   12345   A   T   0|0#.   0|0#.   ./. 0|0#.   0|0#.   ./. ./. 0|0 0|0     
1   78906   C   T   0|0 0|0#.   ./  0|0#.   0|0 ./. 0|0 0|0 0|0     

I tried following code but not getting desired output. Kindly help me to resolve this

awk -v FS="\t" -v OFS="\t" '{ for(i=1;i<=NF;i  ) if ( $1 ~ /\#[>20]/ ) {print $0} else; {print"./."}}' input_file.txt

CodePudding user response:

For your limited input:

sed 's%0|0#1*[0-9] %./. %g; s/0|0#[2-9][0-9] /0|0 /g' input_file.txt

  • It's important to find a character for the substitution that's not forward slash.
  • <20 is matched as 1*[0-9] (note trailing space)
  • >=20 is [2-9][0-9]
  • glob replacement is used

Ack. I see you say "tab separated." The paste into my system has spaces.- the spaces becomes tabs then:

sed 's%0|0#1*[0-9]\t%./.\t%g; s/0|0#[2-9][0-9]\t/0|0\t/g' input_file.txt

CodePudding user response:

This awk command should do the trick:

awk -v cutoff=20 '
    BEGIN { FS=OFS="\t" }
    { for (i=5; i<=NF;   i)
          if ($i ~ /#[0-9]/) {
              sub(/.*#/, "", $i)
              $i = $i>cutoff ? "0|0" : "./."
          }
    } 1
' input_file.txt
  • Related