I asked this as part of a larger question yesterday. A commenter suggested splitting it off into its own question.
I have x files. The contents of the first five files appear below. On the second row, i
is a counter so that i = 0
for the first file. Next to i
is time
and it always increases by 0.5 when i
increases by 1.
However, I have only kept every xth file. So while i = 0
for the first file that I have, i = 100
in the second file, and so on.
6 # This file is called "0.xyz" (<--the "6" is the same in all files)
i = 0, time = 0.000, k = 9000000000000
X -7.6415350292 6.0494971539 8.1919697993
Y -6.6418362233 5.9231018862 8.4056822626
Y -8.0518670684 6.3158684817 9.0061271154
X 26.8252967820 20.4661074967 17.8025744066
Y 26.4477411207 20.4071029058 16.9121571912
Y 26.4399648474 21.2950722068 18.1009273227
6 # This file is called "100.xyz"
i = 100, time = 50.000, k = 2500000000000
X -6.2423192714 -1.5704681396 -9.5648670474
Y -5.4925100813 -1.6522059045 -8.9030589772
Y -6.7765278574 -2.3616512405 -9.4776648590
X 4.1248924594 27.8487302083 -17.5400886312
Y 4.1238657681 26.9869907778 -17.9727402579
Y 5.0750649402 28.1292768156 -17.6848507559
6 # This file is called "200.xyz"
i = 200, time = 100.000, k = 3945000000000
X 19.0090162215 -5.9338939011 6.1931167954
Y 18.4748060757 -6.4905073540 5.6656446036
Y 19.2825591449 -6.4479943255 7.0179774953
X 11.0203415273 34.6029396705 2.7220660957
Y 11.1184002007 34.8398120338 1.8089008500
Y 10.3349649622 33.9509485292 2.5605794622
6 # This file is called "300.xyz"
i = 300, time = 150.000, k = 2341000000000
X -7.6415350292 6.0494971539 8.1919697993
Y -6.6418362233 5.9231018862 8.4056822626
Y -8.0518670684 6.3158684817 9.0061271154
X 26.8252967820 20.4661074967 17.8025744066
Y 26.4477411207 20.4071029058 16.9121571912
Y 26.4399648474 21.2950722068 18.1009273227
6 # This file is called "400.xyz"
i = 400, time = 200.000, k = 2500000000000
X -6.2423192714 -1.5704681396 -9.5648670474
Y -5.4925100813 -1.6522059045 -8.9030589772
Y -6.7765278574 -2.3616512405 -9.4776648590
X 4.1248924594 27.8487302083 -17.5400886312
Y 4.1238657681 26.9869907778 -17.9727402579
Y 5.0750649402 28.1292768156 -17.6848507559
What I would like to do is match these files (above) with data from another file (below). In the file below, each row matches one file above according to the i
(Step). Then I would like to print the first three columns of the matching rows in the file below to a new file.
Although I have included file names, I would prefer to do the matching by using i
, not the file names.
I know how to do this by simple iteration. What is mean is, I can count and print every xth row of the file below to a new file. However, I would like to use a more sophisticated approach that specifically matches i
because these are very long files and there could potentially be rows missing so that I'd end up with a mismatch between the files above and the one I want here.
# Step Time Ax Ay Az Bx By Bz Cx Cy Cz Final
0 0.000 14.8032123290 0.0000000000 0.0000000000 0.0000000000 14.8032123290 0.0000000000 0.0000000000 0.0000000000 14.8032123290 3243.9033438318
1 0.500 14.8029498502 0.0000000000 0.0000000000 0.0000000000 14.8029498502 0.0000000000 0.0000000000 0.0000000000 14.8029498502 3243.7307919244
2 1.000 14.8026923814 0.0000000000 0.0000000000 0.0000000000 14.8026923814 0.0000000000 0.0000000000 0.0000000000 14.8026923814 3243.5615395313
3 1.500 14.8024398604 0.0000000000 0.0000000000 0.0000000000 14.8024398604 0.0000000000 0.0000000000 0.0000000000 14.8024398604 3243.3955453870
4 2.000 14.8021922354 0.0000000000 0.0000000000 0.0000000000 14.8021922354 0.0000000000 0.0000000000 0.0000000000 14.8021922354 3243.2327751298
...
100 50.000 14.8032123290 0.0000000000 0.0000000000 0.0000000000 14.8032123290 0.0000000000 0.0000000000 0.0000000000 14.8032123290 3243.9033438318
...
200 100.000 14.8029498502 0.0000000000 0.0000000000 0.0000000000 14.8029498502 0.0000000000 0.0000000000 0.0000000000 14.8029498502 3243.7307919244
...
300 150.000 14.8026923814 0.0000000000 0.0000000000 0.0000000000 14.8026923814 0.0000000000 0.0000000000 0.0000000000 14.8026923814 3243.5615395313
301 150.500 14.8024398604 0.0000000000 0.0000000000 0.0000000000 14.8024398604 0.0000000000 0.0000000000 0.0000000000 14.8024398604 3243.3955453870
...
400 200.000 14.8021922354 0.0000000000 0.0000000000 0.0000000000 14.8021922354 0.0000000000 0.0000000000 0.0000000000 14.8021922354 3243.2327751298
Example of what I would like the result to be from manipulating the file above to match the set of example files at the top of the question:
0 0.000 14.8032123290
100 50.000 14.8032123290
200 100.000 14.8029498502
300 150.000 14.8026923814
400 200.000 14.8021922354
If anyone has any hints on how to approach this, I would be grateful.
CodePudding user response:
You can use an awk
script to do this as follows:
awk 'FNR == 1 {
if ($0 ~ /^i =/) {
dataFile = 0;
step[$3 0] = FILENAME;
}
else {dataFile = 1;}
}
dataFile == 1 && step[$1] {
print $1, $2, $3;
}' *.xyz data.txt
(assumes the final file is called data.txt
; change as necessary)
FNR == 1
matches the first line of every file and will either capture the step from the xyz
file, or set a flag indicating that we've reached the data file. The $3 0
bit will just force awk
to convert the 3rd field to a number (i.e., drop the trailing comma) due to the request to perform a mathematical operation.
dataFile == 1 && step[$i]
matches lines within the data file where the step value was seen in an xyz
file.
NOTE: You must specify all the xyz
files before the final data file, so that all the steps are collected before the data file is processed.
Sorry, when I put the above solution together, I thought the # This file is called
lines were not part of the file. Modified script is here:
awk '
FNR == 2 && FILENAME != ARGV[ARGC-1] {
step[$3 0] = FILENAME;
}
FILENAME == ARGV[ARGC-1] && step[$1] {
print $1, $2, $3;
}' *.xyz data.txt
This version uses ARGV & ARGC to determine whether the "data" file is being processed.
- If not the data file and line number == 2, cache the "step" value
- If the data file, and the step is in the list, print the first 3 fields
Result:
0 0.000 14.8032123290
100 50.000 14.8032123290
200 100.000 14.8029498502
300 150.000 14.8026923814
400 200.000 14.8021922354
CodePudding user response:
Assumptions:
- matches are based on both
i
(akastep
) andtime
The step / time / Ax
data file:
$ cat match.dat
# Step Time Ax Ay Az Bx By Bz Cx Cy Cz Final
0 0.000 14.8032123290 0.0000000000 0.0000000000 0.0000000000 14.8032123290 0.0000000000 0.0000000000 0.0000000000 14.8032123290 3243.9033438318
1 0.500 14.8029498502 0.0000000000 0.0000000000 0.0000000000 14.8029498502 0.0000000000 0.0000000000 0.0000000000 14.8029498502 3243.7307919244
2 1.000 14.8026923814 0.0000000000 0.0000000000 0.0000000000 14.8026923814 0.0000000000 0.0000000000 0.0000000000 14.8026923814 3243.5615395313
3 1.500 14.8024398604 0.0000000000 0.0000000000 0.0000000000 14.8024398604 0.0000000000 0.0000000000 0.0000000000 14.8024398604 3243.3955453870
4 2.000 14.8021922354 0.0000000000 0.0000000000 0.0000000000 14.8021922354 0.0000000000 0.0000000000 0.0000000000 14.8021922354 3243.2327751298
100 50.000 14.8032123290 0.0000000000 0.0000000000 0.0000000000 14.8032123290 0.0000000000 0.0000000000 0.0000000000 14.8032123290 3243.9033438318
200 100.000 14.8029498502 0.0000000000 0.0000000000 0.0000000000 14.8029498502 0.0000000000 0.0000000000 0.0000000000 14.8029498502 3243.7307919244
300 150.000 14.8026923814 0.0000000000 0.0000000000 0.0000000000 14.8026923814 0.0000000000 0.0000000000 0.0000000000 14.8026923814 3243.5615395313
301 150.500 14.8024398604 0.0000000000 0.0000000000 0.0000000000 14.8024398604 0.0000000000 0.0000000000 0.0000000000 14.8024398604 3243.3955453870
400 200.000 14.8021922354 0.0000000000 0.0000000000 0.0000000000 14.8021922354 0.0000000000 0.0000000000 0.0000000000 14.8021922354 3243.2327751298
One awk
idea:
awk '
FNR==NR { if (FNR>1) # skip header line in 1st file
Ax[$1 OFS $2]=$3 # use step OFS time as index for Ax[] array
next
}
$1 == "i" { gsub(/,/,"") # remove commas from line so we can use normal FS delimiter to pull ...
i=$3 # field #3 (i) and ...
time=$6 # field #6 (time)
if ( (i OFS time) in Ax) # if i OFS time is an index in Ax[] array ...
print i,time,Ax[i OFS time] # print our 3 values to stdout
}
' match.dat [0-9]*.xyz
This generates:
0 0.000 14.8032123290
100 50.000 14.8032123290
200 100.000 14.8029498502
300 150.000 14.8026923814
400 200.000 14.8021922354
If OP needs the output displayed with pretty columns one idea would be to pipe the results to column
, eg:
$ awk '...' match.dat [0-9]*.xyz | column -t
0 0.000 14.8032123290
100 50.000 14.8032123290
200 100.000 14.8029498502
300 150.000 14.8026923814
400 200.000 14.8021922354
NOTE: this code is matching on exact matches of character strings; it is not matching based on numeric values; so 150 != 150.00
CodePudding user response:
This might work for you (GNU sed):
sed -En '2~100s/^((\s*\S ){3}).*/\1/p' file
Turn on extended regexp and off implicit printing -En
.
Starting from line 2 (the line following the header) use modulus 100 to select the required lines and then the substitution command to keep only the first three columns.