I am trying to get all text files in a directory and merge them in line form along with corresponding contents of the file. Data output of the program is
Sample output
Filename Contents
001.txt abadsadsad
002.txt abadsadsad
003.txt abadsadsad
Desired Output
001 abadsadsad
002 abadsadsad
003 abadsadsad
Code:
target= echo "Enter target directory: "
read target
mkdir .dump
mv $target/o1.txt $target/.dump/o1-old.txt
mv $target/o2.txt $target/.dump/o2-old.txt
mv $target/file-content-list.txt $target/.dump/output-old.txt || true #Ensure no o1,o2 and file-content-list.txt file is in target
for f in "$target"/*;
do
echo -e $(basename "$f" '\t') >>o1.txt && echo $(cat "$f") >>o2.txt
done
#| awk 'END { printf("File count: %d", NR); } NF=NF' ## Use this one with "done" (previous line) to get file count if needed
paste -d' ' $target/o1.txt $target/o2.txt | column -s $'\t' -t >> file-content-list.txt #Output file is printed. Remove it and from the target if you plan on reusing there.
rm $target/o1.txt
rm $target/o2.txt
How do I optimize this code. Also is there a bash command that can be used to remove the .txt from first column? like a delimiter of sorts? There is also a sorting issue. e.g. if file names are 1,2,3 and so on it sorts them like
1
10
2
20
3
I always have to endup naming them as 0001 0002 and so on.
How do we fix this?
CodePudding user response:
Consider:
# For each txt file
for f in "$target"/*.txt; do
# outupt the filename name without .txt extension
basname "$f" .txt
# Output the file contents with newlines replaced by a space.
tr '\n' ' ' <"$f"
done |
# Join two lines of output by a tabulation. The delimiter is arbitrary and is beeing read by column.
paste -d $'\t' - - |
# Columnate the output.
column -s $'\t' -t
CodePudding user response:
With gawk
gawk '
BEGINFILE {filename = FILENAME; sub(/\.[^.] $/, "", filename)}
{print filename, $0}
' *.txt | sort -k1,1n