I'm working with GWAS data, My data looks like this:
IID, kgp11004425, rs11274005, kgp183005, rs746410036, kgp7979600
1 00 AG GT AK 00
32 AG GG AA 00 AT
98 TT AA 00 AG AA
3 GG AG 00 GT GG
Desired Output:
IID, kgp11004425, rs11274005, kgp183005, rs746410036, kgp7979600
1 N/A AG GT AK N/A
32 AG GG AA N/A AT
98 TT AA N/A AG AA
3 GG AG N/A GT GG
Here I'm trying to replace "00" with "N/A", but since I have 00 in the first row or header row its replacing here also like kgp11N/A4425, rs11274N/A5,kgp183N/A5....The bash command I used:
sed 's~00~N/A~g' allSNIPsFinaldata.csv
Can anyone please help "how not to include/Skip the first row or header row and apply this effect. please help
CodePudding user response:
With 2 capture groups you can use this sed
:
sed -E 's~(^|[[:blank:]])00([[:blank:]]|$)~\1N/A\2~g' file
IID, kgp11004425, rs11274005, kgp183005, rs746410036, kgp7979600
1 N/A AG GT AK N/A
32 AG GG AA N/A AT
98 TT AA N/A AG AA
3 GG AG N/A GT GG
Details:
(^|[[:blank:]])
: Match start or a whitespace in capture group #100
: Match00
([[:blank:]]|$)
: Match end or a whitespace in capture group #2\1N/A\2
: Replacement to put back value of capture group #1 followed byN/A
followed by value of capture group #2
CodePudding user response:
You may specify an address to select the line(s) to apply the command to. Thus you might choose to exclude the first line like this:
sed '1!s~00~N/A~g' allSNIPsFinaldata.csv
As a sidenote I'd like to note that your example isn't actually CSV despite the file name; your header is comma-delimited but the rest of the file is using spaces.
CodePudding user response:
Using sed
$ sed 's|\<00\>|N/A|g' input_file
IID, kgp11004425, rs11274005, kgp183005, rs746410036, kgp7979600
1 N/A AG GT AK N/A
32 AG GG AA N/A AT
98 TT AA N/A AG AA
3 GG AG N/A GT GG
CodePudding user response:
You might also skip the first row starting from the second one:
sed '2,$s~00~N/A~g' allSNIPsFinaldata.csv
If you don't want partial word matches, you can implement word boundaries around the 00
in different ways.