Home > Enterprise >  How to split large *.csv files with headers in Bash?
How to split large *.csv files with headers in Bash?

Time:11-09

I need split big *.csv file for several smaller. Currently there is 661497 rows, I need each file with max. 40000. I've tried solution that I found on Github but with no success:

FILENAME=/home/cnf/domains/cnf.com.pl/public_html/sklep/dropshipping-pliki/products-files/my_file.csv
HDR=$(head -1 ${FILENAME})
split -l 40000 ${FILENAME} xyz
n=1
for f in xyz*
do
    if [[ ${n} -ne 1 ]]; then
        echo ${HDR} > part-${n}-${FILENAME}.csv
    fi
    cat ${f} >> part-${n}-${FILENAME}.csv
    rm ${f}
    ((n  ))
done

The error I get:

/home/cnf/domains/cnf.com.pl/public_html/sklep/dropshipping-pliki/download.sh: line 23: part-1-/home/cnf/domains/cnf.com.pl/public_html/sklep/dropshipping-pliki/products-files/my_file.csv.csv: No such file or directory

thanks for help!

CodePudding user response:

Keep in mind FILENAME contains both a directory and a file so later in the script when you build the new filename you get something like:

part-1-/home/cnf/domains/cnf.com.pl/public_html/sklep/dropshipping-pliki/products-files/tyre_8.csv.csv

One quick-n-easy fix would be split the directory and filename into 2 separate variables, eg:

srcdir='/home/cnf/domains/cnf.com.pl/public_html/sklep/dropshipping-pliki/products-files'
filename='tyre_8.csv'

hdr=$(head -1 ${srcdir}/${filename})
split -l 40000 "${srcdir}/${filename}" xyz
n=1

for f in xyz*
do
    if [[ ${n} -ne 1 ]]; then
        echo ${hdr} > "${srcdir}/part-${n}-${filename}"
    fi
    cat "${f}" >> "${srcdir}/part-${n}-${filename}"
    rm "${f}"
    ((n  ))
done

NOTES:

  • consider using lowercase variables (using uppercase variables raises the possibility of problems if there's an OS variable of the same name)
  • wrap variable references in double quotes in case string contains spaces
  • don't need to add a .csv extension on the new filename since it's already part of $filename
  •  Tags:  
  • bash
  • Related