My goal is to convert file with 2 1 0 to file with 1 0 -1 via A H B. For example,
Infile.txt Temp.txt Final.txt
2 2 2 1 1 1 0 0 A A A H H H B B 1 1 1 0 0 0 -1 -1
I was able to convert from numerics(2 1 0) to characters(A H B) using the code as follows:
cut -f2- Infile.txt | sed '1,1d' | sed 's/2/A/g' | sed 's/1/H/g' | sed 's/0/B/g' > Temp.txt
However, I could not convert from (A H B) to (1 0 -1). Hence, I got stuck with Temp.txt
So, I would appreciate any solution to figure it out. Thanks!
CodePudding user response:
do you REALLY need the temp part ? you can do a perfect mapping between the 3 with just a tiny bit of regex
gsub()
:
echo '2 1 1 1 0 1 2 1 1 2 0 2 0 2 1 2 1 0 1 0 0 1 2 1 0 2 2 2 2 1 0 0 2 2 0 2 0 2 0 1 2 0 1 1 0 2 0 1 1 1 0 0 2 0 0 2 1' |
mawk '{ print }
gsub( _,__) gsub(!_, _) gsub(__,"-"!_) \
gsub(!_ !_,!_)^_' __='\2' |
gtee >( gpaste - | column -t | gsed -zE 's/^|\n/&\n/g' >&2;) |
mawk NF=NF FS='[^0-9-] ' OFS='\n' | nonEmpty | rs -t -c$'\n' 0 2 | uniqC
2 1 1 1 0 1 2 1 1 2 0 2 0 2 1 2 1 0 1 0 0 1 2 1 0 2 2 2 2 1 0 0 2 2 0 2 0 2 0 1 2 0 1 1 0 2 0 1 1 1 0 0 2 0 0 2 1
1 0 0 0 -1 0 1 0 0 1 -1 1 -1 1 0 1 0 -1 0 -1 -1 0 1 0 -1 1 1 1 1 0 -1 -1 1 1 -1 1 -1 1 -1 0 1 -1 0 0 -1 1 -1 0 0 0 -1 -1 1 -1 -1 1 0
19 0 -1
19 1 0
19 2 1
CodePudding user response:
OP has mentioned in a comment the source file is a 20,000 x 500 (row x column) matrix of the digits 2
, 1
and 0
.
Create a 20000 x 501 (rows x columns) matrix:
awk '
BEGIN { for (i=1;i<=20000;i ) {
sep=""
for (j=1;j<=167;j ) {
printf "%s2 1 0", sep
sep=" "
}
print ""
}
}
' > matrix.dat
$ head -5 matrix.dat | cut -c1-30
2 1 0 2 1 0 2 1 0 2 1 0 2 1 0
2 1 0 2 1 0 2 1 0 2 1 0 2 1 0
2 1 0 2 1 0 2 1 0 2 1 0 2 1 0
2 1 0 2 1 0 2 1 0 2 1 0 2 1 0
2 1 0 2 1 0 2 1 0 2 1 0 2 1 0
One awk/gsub()
idea:
awk '{ gsub(/1/,9)
gsub(/2/,1)
gsub(/0/,-1);
gsub(/9/,0)
}
1
' matrix.dat > matrix.awk1.out
One awk/loop
idea:
awk '{ for (i=1;i<=NF;i )
$i=$i-1
}
1
' matrix.dat > matrix.awk2.out
One sed
idea:
sed 's/1/9/g;s/2/1/g;s/0/-1/g;s/9/0/g' matrix.dat > matrix.sed.out
These all generate the same result:
$ diff matrix.awk1.out matrix.awk2.out
$ diff matrix.awk2.out matrix.sed.out
$ head -5 matrix.awk1.out | cut -c1-35
1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1
1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1
1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1
1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1
1 0 -1 1 0 -1 1 0 -1 1 0 -1 1 0 -1
Run times:
- system:
cygwin
(in a VM),awk 5.1.1
,sed 4.8
- 5.5 secs :
awk/gsub()
- 3.9 secs :
awk/loop
- 5.9 secs :
sed
CodePudding user response:
Use scan
to read the numbers file, subtract 1
and write to file.
{scan(text = "2 2 2 1 1 1 0 0") - 1L} |> as.character() |> writeLines("~/Temp/Final.txt")
Created on 2022-10-16 with reprex v2.0.2
Edit
Here is a way to read the numeric matrix, subtract 1 and write the result.
infile <- "~/Temp/Infile.txt"
x <- scan(infile, nlines = 1)
nc <- length(x)
x <- scan(infile) - 1L
write.table(matrix(x, ncol = nc), "~/Temp/Final.txt",
quote = FALSE, row.names = FALSE, col.names = FALSE)
rm(x) # final clean up
CodePudding user response:
tr
is a fair choice but does not like outputting the double symbol '-1'
tr "012" "-01" input
should translates about as fast as you will with a few chars typed in a shell. you could pipe the result to a sed
sed 's/-/-1/g'
to expand the representation