I'm working with GWAS data, My data looks like this:
IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,00,AG,GT,AK,00
32,AG,GG,AA,00,AT
100,TT,AA,00,AG,AA
3,GG,AG,00,GT,GG
Desired Output:
IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,N/A,AG,GT,AK,N/A
32,AG,GG,AA,N/A,AT
100,TT,AA,N/A,AG,AA
3,GG,AG,N/A,GT,GG
Here I'm trying to replace "00" with "N/A", but since I have 00 in the first row & First Column (IID), the command I used:
sed '1!s~00~N/A~g' allSNIPsFinaldata.csv
The above command excludes the first row but not the first column as a result I got IID Values 100, 200, and 300 as 1N/A, 2N/A, and 3N/A. Can anyone please help "how to exclude the first row & First Column as well and perform the above operation. please help
CodePudding user response:
With your shown samples in GNU awk
using its gensub
function, please try following awk
program.
awk '
BEGIN{
FS=OFS=","
}
FNR==1{
print
next
}
{
secondPart=gensub(/^[^,]*,(.*)/,"\\1","g")
sub(/^00,/,"N/A,",secondPart)
gsub(/,00,/,",N/A,",secondPart)
sub(/,00$/,",N/A",secondPart)
print $1 OFS secondPart
}
' Input_file
CodePudding user response:
Assuming the columns are separated by space
characters such as whitespace
or tab character, would you please try:
sed -E '1!s~([[:space:]])00([[:space:]]|$)~\1N/A\2~g' allSNIPsFinaldata.csv
- The address
1!
skips the 1st row. - The regex
([[:space:]])00([[:space:]]|$)
matches the00
string preceded by a space character (it prevents to match the 1st column) and followed by a space character or the end of the line.
CodePudding user response:
An awk:
$ awk '
BEGIN {
FS=OFS="," # set field delimiters to a comma
}
FNR>1 { # process records after the first
for(i=1;i<=NF;i ) # iterate all fields (maybe start from 2nd?)
if($i=="00") # if field is 00
$i="N/A" # replace
}1' file # output
Output:
IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,N/A,AG,GT,AK,N/A
32,AG,GG,AA,N/A,AT
100,TT,AA,N/A,AG,AA
3,GG,AG,N/A,GT,GG
CodePudding user response:
If you want to repolace only 00 in other columns you have to add a delimiter (I am assuming space in my command) in your pattern:
sed -i 's~ 00 ~ N/A ~g' allSNIPsFinaldata.csv
CodePudding user response:
This might work for you (GNU sed):
sed -E '1!{s/,00(,|$)/,N\/A\1/g;s//,N\/A\1/g}' file
If not the first line and ,
followed by 00
followed by ,
or end-of-line, replace the 00
by N/A
and other parts of the match remain unchanged.
This substitution is global but needs to be implemented twice because the patterns may overlap.