I want to split the file based on the 1st Character of the word and create output files based on the 1st character. I am doing...
awk '{print > substr($0, 1, 1)}' "$File"
But the awk
is giving 'fatal: expression for >' redirection has null string value'
.
The file contains some blank lines.
How do I ignore the blank lines while I do the split.
The content of $File is
100009-01 -- This should go in file named 1
200009-01 -- This should go in file named 2
300009-01 -- This should go in file named 3
400009-01
500009-01
600037-01
700037-01
800037-01
900037-01
100037-01 -- This should go in file named 1
A0037-02_ -- This should go in file named A
a00037-02 -- This should go in file named a
c00037-02
B00037-02
200037-02
It should generate the file named "1" and all the lines that are starting with 1 should go into this file.
Thanks
CodePudding user response:
With your shown samples, please try following awk
code.
sort -k1.1 Input_file |
awk '
!NF{ next }
{
currentFile=substr($1,1,1)
}
prev!=currentFile{
close(prev)
}
{
print > (currentFile)
prev=currentFile
}
'
Explanation: Adding detailed explanation for above.
sort -k1.1 Input_file | ##Sorting Input_file with 1st letter to make it easier for awk.
awk ' ##Sending output to awk program as an input.
!NF{ next } ##If its an empty line then move to next line.
{
currentFile=substr($1,1,1) ##Setting currentFile to 1st letter of current line.
}
prev!=currentFile{ ##If prev is NOT equal to currentFile then do following.
close(prev) ##Closing prev file in backend to avoid errors.
}
{
print > (currentFile) ##Printing current line into currentFile output file.
prev=currentFile ##Setting currentFile value to prev here.
}
'
CodePudding user response:
The file contains some blank lines. How do I ignore the blank lines while I do the split.
If this is sole problem you might simply fix your code following way:
awk '$0!=""{print > substr($0, 1, 1)}' "$File"
Explanation: I added condition to your action, which is true if whole line ($0
) is not equal (!=
) empty string (""
), therefore empty lines will be ignored.
CodePudding user response:
Here is a minor update to your original code:
awk 'NF{print > substr($1, 1, 1)}' "$File"
Since awk
works with (pattern){action}
rules, it implies that action
is taken when pattern
is non-zero or non-empty. The value of NF
gives the total number of fields in your current record (line). By using NF
as the pattern
, awk will perform the action if the current line contains non-space characters.
Besides that, we also use $1
instead of $0
. This is just to avoid that there are lines that could start with a space and we use the first character of the first field.
CodePudding user response:
Here's how it could be done with bash:
while read -r line; do
echo "$line" >> "${line:0:1}"
done < "$File"
CodePudding user response:
I don't know how to put this into one shell script, but you can base yourself on following:
cut -c 1 test.txt | sort | uniq
This gives the list of the first characters, present in your file (it also gives you the filenames you're about to create).
grep "^1" test.txt
This gives you all the lines of your file, starting with "1".
Take care: don't use a>file
because this will always delete and recreate your file. I propose you do a>>file
, which creates the files in case of non-existing and appends otherwise.
So, in pseudocode, you should get something like:
foreach (char a in $(cut -c 1 test.txt | sort | uniq))
{
grep "^$a" test.txt >>$a
}