Home > Software engineering >  How to replace "00" in data with "N/A" skipping first row in sed?
How to replace "00" in data with "N/A" skipping first row in sed?

Time:04-21

I'm working with GWAS data, My data looks like this:

IID, kgp11004425, rs11274005, kgp183005, rs746410036, kgp7979600
1       00           AG        GT            AK          00
32      AG           GG        AA            00          AT
98      TT           AA        00            AG          AA       
3       GG           AG        00            GT          GG

Desired Output:

IID, kgp11004425, rs11274005, kgp183005, rs746410036, kgp7979600
1       N/A          AG        GT            AK          N/A
32      AG           GG        AA            N/A         AT
98      TT           AA        N/A           AG          AA       
3       GG           AG        N/A            GT          GG

Here I'm trying to replace "00" with "N/A", but since I have 00 in the first row or header row its replacing here also like kgp11N/A4425, rs11274N/A5,kgp183N/A5....The bash command I used:

sed 's~00~N/A~g' allSNIPsFinaldata.csv 

Can anyone please help "how not to include/Skip the first row or header row and apply this effect. please help

CodePudding user response:

With 2 capture groups you can use this sed:

sed -E 's~(^|[[:blank:]])00([[:blank:]]|$)~\1N/A\2~g' file

IID, kgp11004425, rs11274005, kgp183005, rs746410036, kgp7979600
1       N/A           AG        GT            AK          N/A
32      AG           GG        AA            N/A          AT
98      TT           AA        N/A            AG          AA
3       GG           AG        N/A            GT          GG

Details:

  • (^|[[:blank:]]): Match start or a whitespace in capture group #1
  • 00: Match 00
  • ([[:blank:]]|$): Match end or a whitespace in capture group #2
  • \1N/A\2: Replacement to put back value of capture group #1 followed by N/A followed by value of capture group #2

CodePudding user response:

You may specify an address to select the line(s) to apply the command to. Thus you might choose to exclude the first line like this:

sed '1!s~00~N/A~g' allSNIPsFinaldata.csv

As a sidenote I'd like to note that your example isn't actually CSV despite the file name; your header is comma-delimited but the rest of the file is using spaces.

CodePudding user response:

Using sed

$ sed 's|\<00\>|N/A|g' input_file
IID, kgp11004425, rs11274005, kgp183005, rs746410036, kgp7979600
1       N/A           AG        GT            AK          N/A
32      AG           GG        AA            N/A          AT
98      TT           AA        N/A            AG          AA
3       GG           AG        N/A            GT          GG

CodePudding user response:

You might also skip the first row starting from the second one:

sed '2,$s~00~N/A~g' allSNIPsFinaldata.csv

If you don't want partial word matches, you can implement word boundaries around the 00 in different ways.

  • Related