Home > Software design >  Conversion from characters [A, H, B] to numeric [1, 0, -1]
Conversion from characters [A, H, B] to numeric [1, 0, -1]

Time:10-17

My goal is to convert file with 2 1 0 to file with 1 0 -1 via A H B. For example,

Infile.txt        Temp.txt           Final.txt
2 2 2 1 1 1 0 0   A A A H H H B B    1 1 1 0 0 0 -1 -1

I was able to convert from numerics(2 1 0) to characters(A H B) using the code as follows:

cut -f2- Infile.txt | sed '1,1d' | sed 's/2/A/g' | sed 's/1/H/g' | sed 's/0/B/g' > Temp.txt

However, I could not convert from (A H B) to (1 0 -1). Hence, I got stuck with Temp.txt

So, I would appreciate any solution to figure it out. Thanks!

CodePudding user response:

do you REALLY need the temp part ? you can do a perfect mapping between the 3 with just a tiny bit of regex gsub() :

echo '2 1 1 1 0 1 2 1 1 2 0 2 0 2 1 2 1 0 1 0 0 1 2 1 0 2 2 2 2 1 0 0 2 2 0 2 0 2 0 1 2 0 1 1 0 2 0 1 1 1 0 0 2 0 0 2 1' | 

mawk '{ print }

gsub( _,__)   gsub(!_, _)   gsub(__,"-"!_) \
                            gsub(!_ !_,!_)^_' __='\2' | 

gtee >( gpaste - | column -t |  gsed -zE 's/^|\n/&\n/g' >&2;) |

mawk NF=NF FS='[^0-9-] ' OFS='\n' | nonEmpty | rs -t -c$'\n' 0 2 | uniqC 

2  1  1  1  0   1  2  1  1  2  0   2  0   2  1  2  1  0   1  0   0   1  2  1  0   2  2  2  2  1  0   0   2  2  0   2  0   2  0   1  2  0   1  1  0   2  0   1  1  1  0   0   2  0   0   2  1

1  0  0  0  -1  0  1  0  0  1  -1  1  -1  1  0  1  0  -1  0  -1  -1  0  1  0  -1  1  1  1  1  0  -1  -1  1  1  -1  1  -1  1  -1  0  1  -1  0  0  -1  1  -1  0  0  0  -1  -1  1  -1  -1  1  0

              19 0   -1
              19 1   0
              19 2   1

CodePudding user response:

OP has mentioned in a comment the source file is a 20,000 x 500 (row x column) matrix of the digits 2, 1 and 0.

Create a 20000 x 501 (rows x columns) matrix:

awk '
BEGIN { for (i=1;i<=20000;i  ) {
            sep=""
            for (j=1;j<=167;j  ) {
                printf "%s2 1 0", sep
                sep=" "
            }
            print ""
        }
      }
' > matrix.dat

$ head -5 matrix.dat | cut -c1-30
2 1 0 2 1 0 2 1 0 2 1 0 2 1 0
2 1 0 2 1 0 2 1 0 2 1 0 2 1 0
2 1 0 2 1 0 2 1 0 2 1 0 2 1 0
2 1 0 2 1 0 2 1 0 2 1 0 2 1 0
2 1 0 2 1 0 2 1 0 2 1 0 2 1 0

One awk/gsub() idea:

awk '{ gsub(/1/,9)
       gsub(/2/,1)
       gsub(/0/,-1);
       gsub(/9/,0)
     }
1
' matrix.dat > matrix.awk1.out

One awk/loop idea:

awk '{ for (i=1;i<=NF;i  ) 
           $i=$i-1
     }
1
' matrix.dat > matrix.awk2.out

One sed idea:

sed 's/1/9/g;s/2/1/g;s/0/-1/g;s/9/0/g' matrix.dat > matrix.sed.out

These all generate the same result:

$ diff matrix.awk1.out matrix.awk2.out
$ diff matrix.awk2.out matrix.sed.out
$ head -5 matrix.awk1.out | cut -c1-35
1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1
1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1
1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1
1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1
1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1

Run times:

  • system: cygwin (in a VM), awk 5.1.1, sed 4.8
  • 5.5 secs : awk/gsub()
  • 3.9 secs : awk/loop
  • 5.9 secs : sed

CodePudding user response:

Use scan to read the numbers file, subtract 1 and write to file.

{scan(text = "2 2 2 1 1 1 0 0") - 1L} |> as.character() |> writeLines("~/Temp/Final.txt")

Created on 2022-10-16 with reprex v2.0.2


Edit

Here is a way to read the numeric matrix, subtract 1 and write the result.

infile <- "~/Temp/Infile.txt"
x <- scan(infile, nlines = 1)
nc <- length(x)

x <- scan(infile) - 1L

write.table(matrix(x, ncol = nc), "~/Temp/Final.txt",
            quote = FALSE, row.names = FALSE, col.names = FALSE)

rm(x)   # final clean up

CodePudding user response:

tr is a fair choice but does not like outputting the double symbol '-1'

tr "012" "-01" input

should translates about as fast as you will with a few chars typed in a shell. you could pipe the result to a sed

sed 's/-/-1/g'  

to expand the representation

  • Related