Home > Blockchain >  add a column to a dataset from the split of two other columns
add a column to a dataset from the split of two other columns

Time:05-20

I have the following dataset in ubuntu and I would like to make an iteration (while or for) in bash in order to generate a new column with the quotient between failed and passed subjects.

id, name, country, Continent, grade, passed, failed
1, Louise Smith, UK, Europe, 7, 5, 1
2, Okio Kiomoto, Japan, Asia, 9, 5, 0
3, Ralph Watson, USA, Northern America, 5.6, 5, 2
4, Mary Mcaann, South Africa, Africa, 4.7, 5, 3
5, Jack Thomson, Australia, Oceania, 10, 5, 0
6, N'dongo Mbaye, Senegal, Africa, 7.9, 5, 1

To do this I have tried to use the following code in a script. But I can't get any result as I can't find any way to add this new generated column to the current dataset.. Any ideas?

while IFS=, read _ _ _ _ _ passed failed; do
newcolumn=$($passed/$failed |bc)

done

As a guideline, the desired output would be as follows.

id, name, country, Continent, grade, passed, failed, new
1, Louise Smith, UK, Europe, 7, 5, 1, 0.2
2, Okio Kiomoto, Japan, Asia, 9, 5, 0, 0
3, Ralph Watson, USA, Northern America, 5.6, 5, 2, 0.4
4, Mary Mcaann, South Africa, Africa, 4.7, 5, 3, 0.6
5, Jack Thomson, Australia, Oceania, 10, 5, 0, 0
6, N'dongo Mbaye, Senegal, Africa, 7.9, 5, 1, 0.2

Thank you

CodePudding user response:

I refactored your code a bit and came up with the following:

#!/bin/bash

# create new header 
header=$(awk 'NR==1 {print}' s.dat)
printf "%s, new\n" "${header}"

# read data file data rows
while IFS=, read a b c d e passed failed; do
    newcolumn=0

    # avoid divide-by-zero
    if [[ "${passed}" -ne "0" ]] ; then
        newcolumn=$(bc <<<"scale=2; ${failed} / ${passed}")
    fi

    # output data with new generated column
    printf "%s %3.2f\n" "${a}, ${b}, ${c}, ${d}, ${e}, ${passed}, ${failed}, " "${newcolumn}"
done < <(awk 'NR!=1 {print}' s.dat)

Contents of s.dat:

id, name, country, Continent, grade, passed, failed
1, Louise Smith, UK, Europe, 7, 5, 1
2, Okio Kiomoto, Japan, Asia, 9, 5, 0
3, Ralph Watson, USA, Northern America, 5.6, 5, 2
4, Mary Mcaann, South Africa, Africa, 4.7, 5, 3
5, Jack Thomson, Australia, Oceania, 10, 5, 0
6, N'dongo Mbaye, Senegal, Africa, 7.9, 5, 1

Output when executing script:

id, name, country, Continent, grade, passed, failed, new
1,  Louise Smith,  UK,  Europe,  7,  5,  1,  0.20
2,  Okio Kiomoto,  Japan,  Asia,  9,  5,  0,  0.00
3,  Ralph Watson,  USA,  Northern America,  5.6,  5,  2,  0.40
4,  Mary Mcaann,  South Africa,  Africa,  4.7,  5,  3,  0.60
5,  Jack Thomson,  Australia,  Oceania,  10,  5,  0,  0.00
6,  N'dongo Mbaye,  Senegal,  Africa,  7.9,  5,  1,  0.20

CodePudding user response:

Using awk

$ awk  'BEGIN { FS=OFS=", " } NR == 1 { $8="new" } NR > 1 { $8=$NF/$(NF-1) }1' input_file
id, name, country, Continent, grade, passed, failed, new
1, Louise Smith, UK, Europe, 7, 5, 1, 0.2
2, Okio Kiomoto, Japan, Asia, 9, 5, 0, 0
3, Ralph Watson, USA, Northern America, 5.6, 5, 2, 0.4
4, Mary Mcaann, South Africa, Africa, 4.7, 5, 3, 0.6
5, Jack Thomson, Australia, Oceania, 10, 5, 0, 0
6, N'dongo Mbaye, Senegal, Africa, 7.9, 5, 1, 0.2
  • Related