Home > database >  How to truncate csv file to n of rows not reading the whole file
How to truncate csv file to n of rows not reading the whole file

Time:07-19

I have big csv(12 gb), so I can't read it in memory, and I need only 100 rows of them and save it back(truncate). Has java such api?

CodePudding user response:

You should stream a file : read it line by line

For example :

    CSVReader reader = new CSVReader(new FileReader("myfile.csv"));
    String [] nextLine;

    // the readnext => Reads the next line from the buffer and converts to a string array.

    while ((nextLine = reader.readNext()) != null) {
        System.out.println(nextLine); 
    }

CodePudding user response:

If you need just a hundred lines, reading just that small portion of the file into memory would be really quick and cheap. You could use the Standard Library file APIs to achieve this quite easily:

val firstHundredLines = File("test.csv").useLines { lines ->
  lines.take(100).joinToString(separator = System.lineSeparator())
}
File("test.csv").writeText(firstHundredLines)

CodePudding user response:

Possible solution

File file = new File(fileName);

// collect first N lines
String newContent = null;
try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
    newContent = reader.lines().limit(N).collect(Collectors.joining(System.lineSeparator()));
}

// replace original file with collected content
Files.write(file.toPath(), newContent.getBytes(), StandardOpenOption.TRUNCATE_EXISTING);

CodePudding user response:

The other answers create a new file from the original file. As I understand it, you want to truncate the original file instead. You can do that quite easily using RandomAccessFile:

    try (RandomAccessFile file = new RandomAccessFile(FILE, "rw")) {
        for (int i = 0;  i < N && file.readLine() != null; i  )
            ;  // just keep reading
        file.setLength(file.getFilePointer());
    }

The caveat is that this will truncate after N lines, which is not necessarily the same thing as N rows, because CSV files can have rows that span multiple lines. If you are sure all your rows only span one line, then the above code will work.

  • Related