Home > Blockchain >  What is the fastest way to read a line from a very large file when you know the line number
What is the fastest way to read a line from a very large file when you know the line number

Time:03-12

At first I loop through the file line by line, and in an array I keep track of specific line numbers that I will need to reference later.

The file is very large (say 1 GB or more) so I one I scan it and load the specific line numbers into an array, the file is no longer in memory.

What would be be the fastest and most efficient way to read a specific line in a file?

The file contains line breaks of string text, where each row represents a transaction.

Instead of the line number, would it make more sense to somehow store the byte offset?

CodePudding user response:

What is the fastest way to read a line from a very large file when you know the line number.

The following assumes that the file is too large to hold in memory, either as a data structure or by memory mapping it.

If you just know the line number and you are reading just one line, then the fastest (simple) way will be to read with a BufferedReader.read and count the line separators.

If you are doing the operation multiple times per file, then it is more complicated to do this efficiently. Firstly, you need a data structure to map line numbers to file offsets. This has to be created by reading the file, counting lines and recording byte offsets. There are various ways to represent the, but an array of offsets will be the most memory efficient ... and the fastest if you are only going to use it to map a line number to a byte offset.

Unfortunately:

  • you cannot get the current byte offset from a Reader, and
  • you cannot "seek" a Reader to a particular byte offset (or character offset).

A reader stack is translating encoded data to a sequence of char values. One char value returned by Reader.read is not necessarily represented by a single byte.

Thus, to implement efficient line no -> byte offset -> line of text retrieval, you will need to use a BufferedInputStream or similar, and make your code aware of the charset of the file you are reading. You will need to use new String(byte[],...,Charset) or similar to get the lines as Java String objects.

The second problem is that each time your application reads the line with a given line number, it will first need to "seek" the stream to the right byte offset. Depending on the access patterns, that may entail a seek system call followed by a read system call. Depending on the access patterns, it may be advisable to implement some kind of caching. You could cache just the line that you read ... or you could cache preceding and / or following lines. The best strategy will depend on access patterns, etc, and probably can only be determined experimentally.

(It might be simpler and/or more efficient to store the lines in a database rather than reading from a 1GB text file. A database will typically do some server-side caching for you, and there various ways of implementing client-side caching.)

CodePudding user response:

You can use BufferedReader to read from a file, it is firster than other way. If i say simply, If you ask to read a line using BufferedReader it reads multiple line at a time but it gives you the line that you are looking for the rest lines are stored in buffer, when you ask again for another line then it check to the buffer if the line exist in buffer it will give you that line from buffer, it does not go to the saved memory every time to read from a file.

BufferedReader.readLine()!

  • Related