Home > database >  How to merge multiple text files and save them as csv or txt without sorting error?
How to merge multiple text files and save them as csv or txt without sorting error?

Time:07-07

I am trying to get all text files in a directory and merge them in line form along with corresponding contents of the file. Data output of the program is

Sample output

Filename Contents
001.txt abadsadsad
002.txt abadsadsad
003.txt abadsadsad

Desired Output

001 abadsadsad       
002 abadsadsad
003 abadsadsad 

Code:

target= echo "Enter target directory: "
read target

mkdir .dump
mv $target/o1.txt $target/.dump/o1-old.txt 
mv $target/o2.txt $target/.dump/o2-old.txt
mv $target/file-content-list.txt $target/.dump/output-old.txt || true #Ensure no o1,o2 and file-content-list.txt file is in target

for f in "$target"/*;
do
    echo -e $(basename "$f" '\t') >>o1.txt && echo $(cat "$f") >>o2.txt
done 
#| awk 'END { printf("File count: %d", NR); } NF=NF' ## Use this one with "done" (previous line) to get file count if needed

paste -d' ' $target/o1.txt $target/o2.txt | column -s $'\t' -t >> file-content-list.txt #Output file is printed. Remove it and from the target if you plan on reusing there.
rm $target/o1.txt 
rm $target/o2.txt

How do I optimize this code. Also is there a bash command that can be used to remove the .txt from first column? like a delimiter of sorts? There is also a sorting issue. e.g. if file names are 1,2,3 and so on it sorts them like

1
10
2
20
3

I always have to endup naming them as 0001 0002 and so on.

How do we fix this?

CodePudding user response:

Consider:

# For each txt file
for f in "$target"/*.txt; do
   # outupt the filename name without .txt extension
   basname "$f" .txt
   # Output the file contents with newlines replaced by a space.
   tr '\n' ' ' <"$f"
done |
# Join two lines of output by a tabulation. The delimiter is arbitrary and is beeing read by column.
paste -d $'\t' - - |
# Columnate the output.
column -s $'\t' -t

CodePudding user response:

With

gawk '
    BEGINFILE {filename = FILENAME; sub(/\.[^.] $/, "", filename)}
    {print filename, $0}
' *.txt | sort -k1,1n
  • Related