Home > Blockchain >  Finding mean value of select items in a column in bash
Finding mean value of select items in a column in bash

Time:10-08

I have a file that contains coordinates of atoms in the following format

A B C
1 2 1
some string
another line of string
  0.00  0.00  0.35
  0.33  0.99  0.37
  0.66  0.50  0.98
  0.66  0.00  0.38

A B and C are names of the different atoms in the system

The next line "1 2 1" gives the number of each type of atoms, so 1A, 2Bs and 1C.

The following lines with three columns of floats give the cartesian coordinates of each atom, so the first line is for A, second and third lines for each of the two Bs, the fourth line for C.

I want to find the average of the z coordinates of the two B atoms, i.e. Average(0.37, 0.98). and replace the z coordinate of atom C with that value, i.e. replace 0.38 with Average(0.37, 0.98).

In the actual problem I have, there are a few dozen files each with different numbers of A B and C atoms. so I need to read the numbers in row 2 and decide which rows of column 3 to operate on. Is there an efficient way to do this in bash, awk or something similar?

I know that I can read in the entire file and read the entire 3rd column into an array with something like the following and then operate.

#!/bin/bash

array_B=( $(cut -d ' ' -f3 file ) )
printf "%s\n" "${array_B[2]}"

But that has problems introduced by the first 4 lines and then the issue of identifying the relevant rows corresponding to B. Any suggestions?

Thanks in advance Jacek

CodePudding user response:

Using awk:

awk '
  BEGIN { start_b = end_b = 4; total = 0 } # Initial dummy values
  FNR == 2 { # Calculate line numbers for B and C atoms
             num_b=$2; start_b=4 $1; end_b=start_b num_b
           }
  FNR <= start_b { print }
  FNR > start_b && FNR <= end_b { total  = $3; print } # Sum up b z-coords
  FNR > end_b { printf "  %.2f  %.2f  %.2f\n", $1, $2, total / num_b } # Replace the C z-coords with average of b
  ' file
  • Related