I have file with multiple columns. each column onwards 4th column,has 2 parts a part before # and a part after #. If the number after # is >20, then I want to replace the # and the followed the number with null like 0|0#99 is becoming 0|0 as 99 > 20. If number followed by # is <20, then I want to replace the entire cell value with "./." like 0|0#14 is becoming "./.". If there isa dot after #, then it the value as it is like 0|0#. will be 0|0#. as it it.
input_file.txt. tab separated file I have
1 12345 A T 0|0#. 0|0#. 0|0#14 0|0#. 0|0#. 0|0#20 0|0#15 0|0#40 0|0#99
1 78906 C T 0|0#99 0|0#. 0|0#10 0|0#. 0|0#45 0|0#20 0|0#95 0|0#78 0|0#99
Output > 20
1 12345 A T 0|0#. 0|0#. ./. 0|0#. 0|0#. ./. ./. 0|0 0|0
1 78906 C T 0|0 0|0#. ./ 0|0#. 0|0 ./. 0|0 0|0 0|0
I tried following code but not getting desired output. Kindly help me to resolve this
awk -v FS="\t" -v OFS="\t" '{ for(i=1;i<=NF;i ) if ( $1 ~ /\#[>20]/ ) {print $0} else; {print"./."}}' input_file.txt
CodePudding user response:
For your limited input:
sed 's%0|0#1*[0-9] %./. %g; s/0|0#[2-9][0-9] /0|0 /g' input_file.txt
- It's important to find a character for the substitution that's not forward slash.
<20
is matched as1*[0-9]
(note trailing space)>=20
is[2-9][0-9]
- glob replacement is used
Ack. I see you say "tab separated." The paste into my system has spaces.- the spaces becomes tabs then:
sed 's%0|0#1*[0-9]\t%./.\t%g; s/0|0#[2-9][0-9]\t/0|0\t/g' input_file.txt
CodePudding user response:
This awk
command should do the trick:
awk -v cutoff=20 '
BEGIN { FS=OFS="\t" }
{ for (i=5; i<=NF; i)
if ($i ~ /#[0-9]/) {
sub(/.*#/, "", $i)
$i = $i>cutoff ? "0|0" : "./."
}
} 1
' input_file.txt