I have multiple alignment format (MAF) files that look like this:
##maf version=1
a score=-1274
s Chr10 34972197 2927 190919061 AACCTTGGGG
s Chr11 36777315 2442 244384623 AACCTTGGGG
a score=-60687
s Chr1 81897274 61972 159217232 CGTTTTCCCGG
s Chr1 33997294 32248 200980605
I would like to modify the second column of these files for lines that start with "s", to have something like this:
##maf version=1
a score=-1274
s species1.Chr10 34972197 2927 190919061 AACCTTGGGG
s species2.Chr11 36777315 2442 244384623 AACCTTGGGG
a score=-60687
s species1.Chr1 81897274 61972 159217232 CGTTTTCCCGG
s species2.Chr1 33997294 32248 200980605 CGTTTTCCCGG
Using this idea https://unix.stackexchange.com/questions/154220/adding-a-character-to-every-other-text-line
I am trying things like this:
awk '$1 == "s" {print ((NR%2)? "species1.":"") $0}'
But I am still far to reach my objective. Do you know how I could achieve this?
CodePudding user response:
Assumptions:
- distance between fields is to be maintained
One awk
idea:
awk '
!/^s/ { print; sfx=0 } # if line does not start with "^s" then print line and reset sfx variable
/^s/ { n=split($0,a,FS,seps) # if line starts with "^s" then split current line; key is to save each separator as a separate seps[] array entry
a[2]="species" sfx "." a[2] # add prefix to value in 2nd field
for (i=1;i<=n;i ) # loop through all field/separator pairs
printf a[i] seps[i] # print each field/separator
print "" # terminate line
}
' maf.dat
This generates:
##maf version=1
a score=-1274
s species1.Chr10 34972197 2927 190919061 AACCTTGGGG
s species2.Chr11 36777315 2442 244384623 AACCTTGGGG
a score=-60687
s species1.Chr1 81897274 61972 159217232 CGTTTTCCCGG
s species2.Chr1 33997294 32248 200980605 CGTTTTCCCGG
CodePudding user response:
Perl to the rescue!
perl -pe 'if (/^s/) { s/Chr/species$x.Chr/; $x } else { $x = 1 }' file.maf
-p
reads the input line by line and outputs each line after processing;- If the line starts with
s
, it prefixesChr
withspecies
and the current number stored in$x
, incrementing it; - Otherwise, it sets
$x
to 1.
CodePudding user response:
awk '
{out=$0}/^s /{(NR%2)?s="species1."$2:s="species2."$2;sub($2,s,out)}{print out}
' file
##maf version=1
a score=-1274
s species1.Chr10 34972197 2927 190919061 AACCTTGGGG
s species2.Chr11 36777315 2442 244384623 AACCTTGGGG
a score=-60687
s species1.Chr1 81897274 61972 159217232 CGTTTTCCCGG
s species2.Chr1 33997294 32248 200980605 CGTTTTCCCGG