I have several very large CSV (technically TSV) files that I need to append together. I had used:
copy file1.txt file2.txt ... fileN.txt combined.txt
but then discovered that each file has a BOM at the start (
) which then appears multiple times in the middle of the file.
However, the files are very big (30-40 million lines each) so I can't open them in NotePad and re-save them to remove the BOMs, so need a command-line solution (either cmd or PowerShell), and ideally something that doesn't require downloading extra libraries.
To recap:
- Files are too large to open in e.g. NotePad , so solution needs to be for command line
- This is on Windows, not *nix
(in my case N=4, so I could cope with a solution that removes the BOM from an individual file, and so run this for each file first before combining)
Edit: This may be a possible solution: Batch script remove BOM () from file but my knowledge of encodings and PowerShell/batch is so poor that I can't even tell if it's applicable or not!