Home > Enterprise >  How to replace 00 with Na excluding first row & first column using bash in comma separated file
How to replace 00 with Na excluding first row & first column using bash in comma separated file

Time:04-29

I'm working with GWAS data, My data looks like this:

IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,00,AG,GT,AK,00
32,AG,GG,AA,00,AT
100,TT,AA,00,AG,AA       
3,GG,AG,00,GT,GG

Desired Output:

IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,N/A,AG,GT,AK,N/A
32,AG,GG,AA,N/A,AT
100,TT,AA,N/A,AG,AA       
3,GG,AG,N/A,GT,GG

Here I'm trying to replace "00" with "N/A", but since I have 00 in the first row & First Column (IID), the command I used:

sed '1!s~00~N/A~g' allSNIPsFinaldata.csv 

The above command excludes the first row but not the first column as a result I got IID Values 100, 200, and 300 as 1N/A, 2N/A, and 3N/A. Can anyone please help "how to exclude the first row & First Column as well and perform the above operation. please help

CodePudding user response:

With your shown samples in GNU awk using its gensub function, please try following awk program.

awk '
BEGIN{
  FS=OFS=","
}
FNR==1{
  print
  next
}
{
  secondPart=gensub(/^[^,]*,(.*)/,"\\1","g")
  sub(/^00,/,"N/A,",secondPart)
  gsub(/,00,/,",N/A,",secondPart)
  sub(/,00$/,",N/A",secondPart)
  print $1 OFS secondPart
}
'  Input_file

CodePudding user response:

Assuming the columns are separated by space characters such as whitespace or tab character, would you please try:

sed -E '1!s~([[:space:]])00([[:space:]]|$)~\1N/A\2~g' allSNIPsFinaldata.csv
  • The address 1! skips the 1st row.
  • The regex ([[:space:]])00([[:space:]]|$) matches the 00 string preceded by a space character (it prevents to match the 1st column) and followed by a space character or the end of the line.

CodePudding user response:

An awk:

$ awk '
BEGIN {
    FS=OFS=","          # set field delimiters to a comma
}
FNR>1 {                 # process records after the first
    for(i=1;i<=NF;i  )  # iterate all fields (maybe start from 2nd?)
        if($i=="00")    # if field is 00
            $i="N/A"    # replace
}1' file                # output

Output:

IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,N/A,AG,GT,AK,N/A
32,AG,GG,AA,N/A,AT
100,TT,AA,N/A,AG,AA       
3,GG,AG,N/A,GT,GG

CodePudding user response:

If you want to repolace only 00 in other columns you have to add a delimiter (I am assuming space in my command) in your pattern: sed -i 's~ 00 ~ N/A ~g' allSNIPsFinaldata.csv

CodePudding user response:

This might work for you (GNU sed):

sed -E '1!{s/,00(,|$)/,N\/A\1/g;s//,N\/A\1/g}' file

If not the first line and , followed by 00 followed by , or end-of-line, replace the 00 by N/A and other parts of the match remain unchanged.

This substitution is global but needs to be implemented twice because the patterns may overlap.

  • Related