Home > other >  read file line by line and sum each line individually
read file line by line and sum each line individually

Time:11-02

Im trying to make a script that creates a file say file01.txt that writes a number on each line.

001
002
...
998
999

then I want to read the file line by line and sum each line and say whether the number is even or odd.
sum each line like 0 0 1 = 1 which is odd
9 9 8 = 26 so even

001 odd
002 even
..
998 even
999 odd

I tried

while IFS=read -r line; do sum =line >> file02.txt; done <file01.txt

but that sums the whole file not each line.

CodePudding user response:

You can do this fairly easily in bash itself making use of built-in parameter expansions to trim leading zeros from the beginning of each line in order to sum the digits for odd / even.

When reading from a file (either a named file or stdin by default), you can use the initialization with default to use the first argument (positional parameter) as the filename (if given) and if not, just read from stdin, e.g.

#!/bin/bash

infile="${1:-/dev/stdin}"     ## read from file provide as $1 or stdin

Which you will use infile with your while loop, e.g.

while read -r line; do        ## loop reading each line
  ...
done < "$infile"

To trim the leading zeros, first obtain the substring of leading zeros trimming all digits from the right until only zeros remain, e.g.

  leading="${line%%[1-9]*}"                         ## get leading 0's

Now using the same type parameter expansion with # instead of %% trim the leading zeros substring from the front of line saving the resulting number in value, e.g.

  value="${line#$leading}"                          ## trim from front

Now zero your sum and loop over the digits in value to obtain the sum of digits:

  for ((i=0;i<${#value};i  )); do                   ## loop summing digits
    sum=$((sum   ${value:$i:1}))
  done

All that remains is your even / odd test. Putting it altogether in a short example script that intentionally outputs the sum of digits in addition to your wanted "odd" / "even" output, you could do:

#!/bin/bash

infile="${1:-/dev/stdin}"     ## read from file provide as $1 or stdin

while read -r line; do                              ## read each line
  [ "$line" -eq "$line" 2>/dev/null ] || continue   ## validate integer
  
  leading="${line%%[1-9]*}"                         ## get leading 0's
  value="${line#$leading}"                          ## trim from front
  sum=0                                             ## zero sum
  
  for ((i=0;i<${#value};i  )); do                   ## loop summing digits
    sum=$((sum   ${value:$i:1}))
  done
  
  printf "%s (sum=%d) - " "$line" "$sum"            ## output line w/sum
                                                    ## (temporary output)
  if ((sum % 2 == 0)); then                         ## check odd / even
    echo "even"
  else
    echo "odd"
  fi
done < "$infile"

(note: you can actually loop over the digits in line and skip removing the leading zeros substring. The removal ensure that if the whole value is used it isn't interpreted as an octal value -- up to you)

Example Use/Output

Using a quick process substitution to provide input of 001 - 020 on stdin you could do:

$ ./sumdigitsoddeven.sh < <(printf "d\n" {1..20})
001 (sum=1) - odd
002 (sum=2) - even
003 (sum=3) - odd
004 (sum=4) - even
005 (sum=5) - odd
006 (sum=6) - even
007 (sum=7) - odd
008 (sum=8) - even
009 (sum=9) - odd
010 (sum=1) - odd
011 (sum=2) - even
012 (sum=3) - odd
013 (sum=4) - even
014 (sum=5) - odd
015 (sum=6) - even
016 (sum=7) - odd
017 (sum=8) - even
018 (sum=9) - odd
019 (sum=10) - even
020 (sum=2) - even

You can simply remove the output of "(sum=X)" when you have confirmed it operates as you expect and redirect the output to your new file. Let me know if I understood your question properly and if you have further questions.

CodePudding user response:

With GNU awk:

awk -vFS='' '{sum=0; for(i=1;i<=NF;i  ) sum =$i;
              print $0, sum%2 ? "odd" : "even"}' file01.txt

The FS awk variable defines the field separator. If it is set to the empty string (this is what the -vFS='' option does) then each character is a separate field.

The rest is trivial: the block between curly braces is executed for each line of the input. It compute the sum of the fields with a for loop (NF is another awk variable, its value is the number of fields of the current record). And it then prints the original line ($0) followed by the string even if the sum is even, else odd.

CodePudding user response:

Would you please try the bash version:

parity=("even" "odd")
while IFS= read -r line; do
    mapfile -t ary < <(fold -w1 <<< "$line")
    sum=0
    for i in "${ary[@]}"; do
        (( sum  = i ))
    done
    echo "$line" "${parity[sum % 2]}"
done < file01.txt > file92.txt
  • fold -w1 <<< "$line" breaks the string $line into lines of character (one digit per line).
  • mapfile assigns array to the elements fed by the fold command.

Please note the bash script is not efficient in time and not suitable for the large inputs.

CodePudding user response:

pure awk:

BEGIN {
    for (i=1; i<=999; i  ) {
        printf ("d\n", i) > ARGV[1]
    }
    close(ARGV[1])

    ARGC = 2
    FS = ""

    result[0] = "even"
    result[1] = "odd"
}

{
    printf("%s: %s\n", $0, result[($1 $2 $3) % 2])
}

Processing a file line by line, and doing math, is a perfect task for awk.

pure bash:

set -e

printf 'd\n' {1..999} > "${1:?no path provided}"

result=(even odd)

mapfile -t num_list < "$1"

for i in "${num_list[@]}"; do
    echo $i: ${result[(${i:0:1}   ${i:1:1}   ${i:2:1}) % 2]}
done

A similar method can be applied in bash, but it's slower.

comparison:

bash is about 10x slower.

$ cd ./tmp.Kb5ug7tQTi

$ bash -c 'time awk -f ../solution.awk numlist-awk > result-awk'

real    0m0.108s
user    0m0.102s
sys 0m0.000s

$ bash -c 'time bash ../solution.bash numlist-bash > result-bash'

real    0m0.931s
user    0m0.929s
sys 0m0.000s

$ diff --report-identical result*
Files result-awk and result-bash are identical

$ diff --report-identical numlist*
Files numlist-awk and numlist-bash are identical

$ head -n 5 *
==> numlist-awk <==
001
002
003
004
005

==> numlist-bash <==
001
002
003
004
005

==> result-awk <==
001: odd
002: even
003: odd
004: even
005: odd

==> result-bash <==
001: odd
002: even
003: odd
004: even
005: odd
  • read is a bottleneck in a while IFS= read -r line loop. More info in this answer.
  • mapfile (combined with for loop) can be slightly faster, but still slow (it also copies all the data to an array first).
  • Both solutions create a number list in a new file (which was in the question), and print the odd/even results to stdout. The path for the file is given as a single argument.
  • In awk, you can set the field separator to empty (FS="") to process individual characters.
  • In bash it can be done with substring expansion (${var:index:length}).
  • Modulo 2 (number % 2) to get odd or even.

CodePudding user response:

The evenness of the sum of digits can be determined by counting the number of odd digits.

awk '{n = gsub(/[13579]/, "&"); print $0, (n % 2 ? "odd" : "even")}' file

If there is an odd number of odd digits then the sum of the digits must be odd.

perl -lpe '$_ .= $" . (y/13579// % 2 ? odd : even)' file
sed 'h;s/[^13579]*//g;s/..//g;s/./odd/;s/^$/even/;H;g;y/\n/ /' file
#!/bin/bash
while IFS= read -r x
do
  y=${x//[!13579]}
  if ((${#y}%2)); then y=odd; else y=even; fi
  printf '%s\n' "$x $y"
done < file
  • Related