Home > Back-end >  searching for character occueances in a line within a file in Linux and editing it
searching for character occueances in a line within a file in Linux and editing it

Time:06-26

I have a delimited fields file that I want to load into the database in Linux. The thing is that the number of delimited fields is not the same in every row. So, I need a shell script to iterate over each line and check for the number of occurrences of the delimiter character, I need 13 occurrences of the delimiter character per line. So, if I have 10 for example, I need to add 2 extra delimiter characters at the end of this line.

Now all that I have got is this:

#!/usr/bin/bash
while read p; do
  if
  -----------
  fi
done <myDataFile

CodePudding user response:

The information provided is rather sparse.

What's your delimiter?

Can the delimiter (quoted) occur in any of the fields?

For a simple case, e.g. delimiter="|" and doesn't occur inside the fields here's a quick awk hack.

$ cat myDataFile 
a|b|c|d|e|f|g|h|i|j|k|l|m
a|b|c|d|e|f|g|h|i|m
a|b|c|d|e|f|g|h|i|j|k|l|m
a|b|d|e|f|g|h|i|j|k|l|m
a|b|c|d|e|f|g|h|i|j|k|l|m
a|b|c|d|e|f|g|h|i|j|k|l|m
a|b|c|d|e|f|i|j|k|l|m
a|b|c|d|e|f|g|h|i|j|k|l|m

And the awk:

awk -F'|' '{missing=13-NF;if(missing==0){print $0}else{printf "%s",$0;for(i=1;i<=missing-1;i  ){printf "|"};print "|"}}' myDataFile 
a|b|c|d|e|f|g|h|i|j|k|l|m
a|b|c|d|e|f|g|h|i|m|||
a|b|c|d|e|f|g|h|i|j|k|l|m
a|b|d|e|f|g|h|i|j|k|l|m|
a|b|c|d|e|f|g|h|i|j|k|l|m
a|b|c|d|e|f|g|h|i|j|k|l|m
a|b|c|d|e|f|i|j|k|l|m||
a|b|c|d|e|f|g|h|i|j|k|l|m

And the awk made pretty and explained:

{
        missing = 13 - NF      # store the number of missing fields
        if (missing == 0) {    # if all fields are present
                print $0       # just print the line 
        } else {               # otherwise
                printf "%s", $0        # first print the line
                for (i = 1; i <= missing - 1; i  ) {   # then pad the line with delimiters (w/o a newline)
                        printf "|"                     
                }
                print "|"      # followed by a last one WITH a newline                     
        }
}
  • Related