I have a matrix like data in tab separated text file. For example:
a b c d
a 0.1 0.5 0.3 0.0
b 0.9 0.2 0.4 0.7
c 0.2 0.0 0.6 0.5
d 0.0 0.5 0.3 0.1
I want this matrix data in tsv file such that 1st column is row index, 2nd column is column index and 3rd column is the value. For example the first 4 lines be like:
a a 0.1
a b 0.5
a c 0.3
a d 0.0
My file is large. It is about 5000 X 5000 matrix; so the starting input file has ~5000 rows and 5000 columns. I was thinking to do it through series of text processing steps but is there some better way to do it on linux?
CodePudding user response:
Now this looks like a job for awk:
awk '
NR==1 { split($0, cols); next }
{ for(col in cols) {
print $1, cols[col], $(col 1)
}
}' \
matrix
This little awk program …
- Stores the column headers into variable
cols
. The line is split like awk would already do for other lines (split on whitespace). This is done by executing the first block only for the first line (NR==1
).next
skips all other statements and proceeds to the next line. - For all other lines, it iterates through the columns extracted from the first line (
for (col in cols)
) and then prints the first column of the current line ($1
, i.e. the row header), then prints the column header (cols[col]
), and finally prints the corresponding cell for the current row column combination.