How to print output in table format in shell script-CodePudding

I am new to shell scripting.. I want to disribute all the data of a file in a table format and redirect the output into another file.

I have below input file File.txt

Fruit_label:1 Fruit_name:Apple
Color:Red
Type: S
No.of seeds:10
Color of seeds :brown
Fruit_label:2 fruit_name:Banana
Color:Yellow
Type:NS

I want it to looks like this

1 |   apple |   red |  S |  10 |   brown 
2 |   banana|   yellow |  NS

I want to read all the data line by line from text file and make the headerlike fruit_label,fruit_name,color,type, no.of seeds, color of seeds and then print all the assigned value in rows.All the above data is different for different fruits for ex. banana dont have seeds so want to keep its row value as blank ..

Can anyone help me here.

CodePudding user response：

Here is my solution. It is a new year gift, usually you have to demonstrate what you have tried so far and we help you, not do it for you.

Disclaimer some guru will probably come up with a simpler awk version, but this works.

File script.awk

# Remove space prefix
function ltrim(s) { sub(/^[ \t\r\n] /, "", s); return s }
# Remove space suffix
function rtrim(s) { sub(/[ \t\r\n] $/, "", s); return s }
# Remove both suffix and prefix spaces
function trim(s) { return rtrim(ltrim(s)); }

# Initialise or reset a fruit array
function array_init() {
    for (i = 0; i <= 6;   i) {
        fruit[i] = ""
    }
}

# Print the content of the fruit
function array_print() {
    # To keep track if something was printed.  Yes, print a carriage return.
    # To avoid printing a carriage return on an empty array.
    printedsomething = 0
    for (i = 0; i <= 6; = i) {
        # Do no print if the content is empty
        if (fruit[i] != "") {
            printedsomething = 1
            if (i == 1) {
                # The first field must be further split, to remove "Fruit_name"
                # Split on the space
                split(fruit[i], temparr, / /)
                printf "%s", trim(temparr[1])
            }
            else {
                printf " |   %s", trim(fruit[i])
            }
        }
    }
    if ( printedsomething == 1 ) {
        print ""
    }
}

BEGIN {
    FS = ":"
    print "Fruit_label| Fruit_name |color| Type |no.of seeds |Color of seeds"
    array_init()
}

/Fruit_label/ {
    array_print()
    array_init()
    fruit[1] = $2
    fruit[2] = $3
}
/Color:/ {
    fruit[3] = $2
}
/Type/ {
    fruit[4] = $2
}
/No.of seeds/ {
    fruit[5] = $2
}
/Color of seeds/ {
    fruit[6] = $2
}

END { array_print() }

To execute, call awk -f script.awk File.txt
awk processes a file line per line. So the idea is to store fruit information into an array.
Every time the line "Fruit_label:....." is found, print the current fruit and start a new one.
Since each line is read in sequence, you tell awk what to do with each line, based on a pattern.
The patterns are what are enclosed between // characters at the beginning of each section of code.
Difficulty: since the first line contains 2 information on every fruit, and I cut the lines on : char, the Fruit_label will include "Fruit_name".
I.e.: the first line is cut like this: $1 = Fruit_label, $2 = 1 Fruit_name, $3 = Apple
This is why the array_print() function is so complicated.
Trim functions are there to remove spaces.
Like for the Apple, Type: S when split on the : will result in S

If it meets your requirements, please see https://stackoverflow.com/help/someone-answers to accept it.

CodePudding user response：

Another approach, is a "Decorate & Process" approach. What is "Decorate & Process"? To Decorate is to take the text you have and decorate it with another separator to make field-splitting easier -- like in your case your fields can contain included whitespace along with the ':' separator between the field-names and values. With your inconsistent whitespace around ':' -- that makes it a nightmare to process ... simply.

So instead of worrying about what the separator is, think about "What should the fields be?" and then add a new separator (Decorate) between the fields and then Process with awk.

Here sed is used to Decorate your input with '|' as separators (a second call eliminates the '|' after the last field) and then a simpler awk process is used to split() the fields on ':' to obtain the field-name and field-value where the field-value is simply printed and the field-names are stored in an array. When a duplicate field-name is found -- it is uses as seen variable to designate the change between records, e.g.

sed -E 's/([^:] :[[:blank:]]*[^[:blank:]] )[[:blank:]]*/\1|/g' file | 
sed 's/|$//' |
awk '
  BEGIN { FS = "|" }
  {
    for (i=1; i<=NF; i  ) {
      if (split ($i, parts, /[[:blank:]]*:[[:blank:]]*/)) {
        if (! n || parts[1] in fldnames) {
          printf "%s %s", n ? "\n" : "", parts[2]
          delete fldnames
          n = 1
        }
        else
          printf " | %s", parts[2]
        fldnames[parts[1]]  
      }
    }
  }
  END { print "" }
'

Example Output

With your input in file you would have:

 1 | Apple | Red | S | 10 | brown
 2 | Banana | Yellow | NS

You will also see a "Decorate-Sort-Undecorate" used to sort data on a new non-existent columns of values by "Decorating" your data with a new last field, sorting on that field, and then "Undecorating" to remove the additional field when sorting is done. This allow sorting by data that may be the sum (or combination) of any two columns, etc...