Home > Net >  How to import from a pipe-delimited text file in Deno using BufReader?
How to import from a pipe-delimited text file in Deno using BufReader?

Time:12-11

I'm trying to read a pipe-delimited file into Deno (before storing it in an SQLite db), and I've run into a problem of comprehension. I thought I'd try to get a simple parse working, before trying to get the pipe to work.

And here's the data/test.csv file:

name|age
Bob|53
Alice|47

Here's my initial code:

import { parse as parseCsv } from "https://deno.land/[email protected]/encoding/csv.ts";

const content = await parseCsv(await Deno.readTextFile("data/test.csv"), {
  skipFirstRow: true,
  separator: "|",
});

console.log(content);

I'm running it with deno run --allow-read db.js, which gives me back the expected:

[ { name: "Bob", age: "53" }, { name: "Alice", age: "47" } ]

However, I'm worried that the source data could get quite large, so I was hoping to use a BufReader:

import { parse as parseCsv } from "https://deno.land/[email protected]/encoding/csv.ts";
import { BufReader } from "https://deno.land/[email protected]/io/mod.ts";

const file = await Deno.open("data/test.csv");
const content = await parseCsv(new BufReader(file), {
  skipFirstRow: true,
  separator: "|",
});

console.log(content);

but the result I get is simply: [].

Is there something obvious I'm missing? Some sort of while loop?

CodePudding user response:

Deno includes in its std library a class that extends the native TransformStream which can be used to parse and iterate a CSV-formatted ReadableStream. This provides the benefit of not only being able to parse file streams, but any other streams (such as a streaming response to a network request).

The class is called CsvStream and is exported from the module at https://deno.land/std@{STD_VERSION}/encoding/csv/stream.ts.

Here's link to the line number in the current version of the std library as I write this answer: https://deno.land/[email protected]/encoding/csv/stream.ts?source#L42

Below is a basic example of how to use it with the example file in your question.

./db.ts:

import {
  CsvStream,
  type CsvStreamOptions,
} from "https://deno.land/[email protected]/encoding/csv/stream.ts";

/**
 * A helper async generator function which yields the parsed CSV rows
 */
async function* iterateCsvRows(
  filePath: string,
  options?: CsvStreamOptions,
): AsyncGenerator<string[], void> {
  // Open the file to get a handle:
  const file = await Deno.open(filePath);

  // Pipe the file's ReadableStream (Uint8Array chunks)
  // through a TextDecoderStream, then pipe those string chunks
  // through the CsvStream:
  const readable = file.readable
    .pipeThrough(new TextDecoderStream())
    .pipeThrough(new CsvStream(options));

  // Yield the resulting parsed string arrays:
  for await (const stringArray of readable) yield stringArray;

  // The file auto-closes after the ReadableStream finishes and closes
}

async function main() {
  const csvPath = "data/test.csv";

  const csvStreamOptions: CsvStreamOptions = {
    separator: "|",
  };

  for await (const row of iterateCsvRows(csvPath, csvStreamOptions)) {
    console.log(row);
  }
}

if (import.meta.main) main();

 % deno --version
deno 1.28.2 (release, x86_64-apple-darwin)
v8 10.9.194.1
typescript 4.8.3

% cat data/test.csv 
name|age
Bob|53
Alice|47

% deno run --allow-read db.ts
[ "name", "age" ]
[ "Bob", "53" ]
[ "Alice", "47" ]


You can — of course — adapt the example above to include an option to parse the first row separately if you expect headers — and then use the values from the parsed headers row as keys to yield an object for every subsequent row, with the values from the array assigned to the object at the corresponding keys. Here's an example of what that might look like using an overloaded function signature:

./db.ts:

import { assert } from "https://deno.land/[email protected]/testing/asserts.ts";
import {
  CsvStream,
  type CsvStreamOptions,
} from "https://deno.land/[email protected]/encoding/csv/stream.ts";

/**
 * A helper async generator function which yields parsed CSV rows
 */
function iterateCsvRows(
  filePath: string,
  options: CsvStreamOptions & { includesHeaders: true },
): AsyncGenerator<Record<string, string>, void>;
function iterateCsvRows(
  filePath: string,
  options: CsvStreamOptions & { includesHeaders: false },
): AsyncGenerator<string[], void>;
function iterateCsvRows(
  filePath: string,
  options: CsvStreamOptions & { includesHeaders: boolean },
): AsyncGenerator<string[] | Record<string, string>, void>;
function iterateCsvRows(
  filePath: string,
  options?: CsvStreamOptions,
): AsyncGenerator<string[], void>;
async function* iterateCsvRows(
  filePath: string,
  options?: CsvStreamOptions & { includesHeaders?: boolean },
) {
  const { includesHeaders, ...opts } = options ?? {};
  const file = await Deno.open(filePath);

  const readable = file.readable
    .pipeThrough(new TextDecoderStream())
    .pipeThrough(new CsvStream(opts));

  const headers: string[] = [];

  if (includesHeaders) {
    const reader = readable.getReader();
    const { done, value } = await reader.read();
    reader.releaseLock();
    assert(!done, "Stream data not found");
    headers.push(...value);
  }

  for await (const stringArray of readable) {
    if (!includesHeaders) {
      yield stringArray;
      continue;
    }

    assert(
      stringArray.length === headers.length,
      `Expected ${headers.length} values, but encountered ${stringArray.length}`,
    );

    const obj: Record<string, string> = {};

    for (let i = 0; i < headers.length; i  = 1) {
      obj[headers[i]] = stringArray[i];
    }

    yield obj;
  }
}

async function main() {
  const csvPath = "data/test.csv";

  console.log("Headers option off:");
  const iter = iterateCsvRows(csvPath, { separator: "|" });
  for await (const row of iter) console.log(row);

  console.log("Headers option on:");
  const iterUsingHeaders = iterateCsvRows(csvPath, {
    separator: "|",
    includesHeaders: true,
  });

  for await (const obj of iterUsingHeaders) console.log(obj);
}

if (import.meta.main) main();

% deno --version
deno 1.28.2 (release, x86_64-apple-darwin)
v8 10.9.194.1
typescript 4.8.3

% cat data/test.csv 
name|age
Bob|53
Alice|47

% deno run --allow-read db.ts
Headers option off:
[ "name", "age" ]
[ "Bob", "53" ]
[ "Alice", "47" ]
Headers option on:
{ name: "Bob", age: "53" }
{ name: "Alice", age: "47" }

  • Related