I'm working with GWAS data, My data looks like this:
IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,00,AG,GT,AK,00
32,AG,GG,AA,00,AT
100,TT,AA,00,AG,AA
3,GG,AG,00,GT,GG
Desired Output:
IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,N/A,AG,GT,AK,N/A
32,AG,GG,AA,N/A,AT
100,TT,AA,N/A,AG,AA
3,GG,AG,N/A,GT,GG
sed '1!s~00~N/A~g' allSNIPsFinaldata.csv
The above command excludes the first row but not the first column as a result I got IID Values 100, 200, and 300 as 1N/A, 2N/A, and 3N/A. Can anyone please help "how to exclude the first row & First Column as well and perform the above operation.
I tried this:
awk 'NR>1{$0=$0","; gsub(/,00,/,",NA,"); sub(/,$/,"")} 1' file
Note: Here in above I think the command took 00 as int but it is a "00" string and as a result, the above command does not replace "00" with "NA". can anyone please help with the command which replaces "00" with "NA"
CodePudding user response:
You can use
sed -E '1!{:a;s~^([^,]*,.*)00~\1N/A~;ta;}' file > newfile
Details:
-E
- enables POSIX ERE syntax1!
- match on all lines but the first:a
- set ana
labels~^([^,]*,.*)00~\1N/A~
- find and capture into Group 1 any zero or more chars other than a comma (at the string start) and a comma and then any text, and then just consume00
, and replace the match with Group 1 contentsta
- upon a successful replacement go back toa
label position in the string.
See the online demo:
#!/bin/bash
s='IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,00,AG,GT,AK,00
32,AG,GG,AA,00,AT
100,TT,AA,00,AG,AA
3,GG,AG,00,GT,GG'
sed -E '1!{:a;s~^([^,]*,.*)00~\1N/A~;ta;}' <<< "$s"
Output:
IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,N/A,AG,GT,AK,N/A
32,AG,GG,AA,N/A,AT
100,TT,AA,N/A,AG,AA
3,GG,AG,N/A,GT,GG
CodePudding user response:
One awk
idea:
awk '
BEGIN { FS=OFS="," }
NR>1 { for (i=2;i<=NF;i ) # skip 1st line; loop through fields 2 to NF
if ($i == "00") # if field = "00" then ...
$i="N/A" # replace with "N/A"
}
1 # print current line
' file
Or as a one-liner:
awk 'BEGIN {FS=OFS=","} NR>1{for (i=2;i<=NF;i ) if ($i == "00") $i="N/A"}1' file
This generates:
IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,N/A,AG,GT,AK,N/A
32,AG,GG,AA,N/A,AT
100,TT,AA,N/A,AG,AA
3,GG,AG,N/A,GT,GG