Home > Software engineering >  How to replace a number to another number in a specific column using awk
How to replace a number to another number in a specific column using awk

Time:10-08

This is probably basic but I am completely new to command-line and using awk. I have a file like this:

1 RQ22067-0 -9
2   RQ34365-4   1
3   RQ34616-4   1
4   RQ34720-1   0
5   RQ14799-8   0
6   RQ14754-1   0
7   RQ22101-7   0
8   RQ22073-1   0
9   RQ30201-1   0

I want the 0s to change to 1 in column3. And any occurence of 1 and 2 to change to 2 in column3. So essentially only changing numbers in column 3. But I am not changing the -9.

1 RQ22067-0 -9
2   RQ34365-4   2
3   RQ34616-4   2
4   RQ34720-1   1
5   RQ14799-8   1
6   RQ14754-1   1
7   RQ22101-7   1
8   RQ22073-1   1
9   RQ30201-1   1

I have tried using (see below) but it has not worked

>> awk '{gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
>> awk '{gsub("1","2",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno

Thank you.

CodePudding user response:

You can check the value of column 3 and then update the field value.

Check for 1 as the first rule because if the first check is for 0, the value will be set to 1 and the next check will set the value to 2 resulting in all 2's.

awk '
{
  if($3==1) $3 = 2
  if($3==0) $3 = 1
}
1' file

Output

1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1

CodePudding user response:

With your shown samples and ternary operators try following code. Simple explanation would be, checking condition if 3rd field is 1 then set it to 2 else check if its 0 then set it to 0 else keep it as it is, finally print the line.

awk '{$3=$3==1?2:($3==0?1:$3)} 1' Input_file


Generic solution: Adding a Generic solution here, where we can have 3 awk variables named: fieldNumber in which you could mention all field numbers which we want to check for. 2nd one is: existValue which we want to match(in condition) and 3rd one is: newValue new value which needs to be there after replacement.

awk -v fieldNumber="3" -v existValue="1,0" -v newValue="2,1" '
BEGIN{
  num=split(fieldNumber,arr1,",")
  num1=split(existValue,arr2,",")
  num2=split(newValue,arr3,",")
  for(i=1;i<=num1;i  ){
    value[arr2[i]]=arr3[i]
  }
}
{
  for(i=1;i<=num;i  ){
    if($arr1[i] in value){
       $arr1[i]=value[$arr1[i]]
     }
  }
}
1
'  Input_file

CodePudding user response:

With this code in your question:

awk '{gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
awk '{gsub("1","2",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
  1. you're running both commands on the same input file and writing their output to the same output file so only the output of the 2nd script will be present in the output, and

  2. you're trying to change 0 to 1 first and THEN change 1 to 2 so the $3s that start out as 0 would end up as 2, you need to change the order of the operations.

This is what you should be doing, using your existing code:

awk '{gsub("1","2",$3); gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno

For example:

$ awk '{gsub("1","2",$3); gsub("0","1",$3)}1' file
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1

The gsub() should also just be sub()s as you only want to perform each substitution once, and you don't need to enclose the numbers in quotes so you could just do:

awk '{sub(1,2,$3); sub(0,1,$3)}1' file

CodePudding user response:

This might work for you (GNU sed):

sed -E 's/\S /\n&\n/3;h;y/01/12/;G;s/.*\n(.*)\n.*\n(.*)\n.*\n.*/\2\1/' file

Surround 3rd column by newlines.

Make a copy.

Replace all 0's by 1's and all 1's by 2's.

Append the original.

Pattern match on newlines and replace the 3rd column in the original by the 3rd column in the amended line.

CodePudding user response:

Also with awk:

awk 'NR > 1 {s=$3;sub(/1/,"2",s);sub(/0/,"1",s);$3=s} 1' file
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
  • the substitutions are made with sub() on a copy of $3 and then the copy with the changes is assigned to $3.
  • Related