Home > Software design >  How to load a CSV into a dictionary (hash)?
How to load a CSV into a dictionary (hash)?

Time:05-18

I'm dealing with (unsorted) CSV-files, where certain columns comprise a key, whereas another column is the value. Let's say, columns (1, 2, 3, 7) of each row comprise the key, and the 11th column is the row's value.

I'd like to load these into hashes so as to be able to quickly access a key's value. I'm new to NumPy and am most impressed with the speed and ease of use of the numpy.loadtxt() -- I can give it exactly the columns I'm interested in and it loads very large CSV-files quickly.

But it loads them into a flat ndarray -- not a dictionary. How would I get something hash-like with the similar speed and ease as that of the loadtxt()?

Reading a line at a time in Python -- such as using the csv-package -- is slow. Looping over the ndarray to duplicate the data into a hash is both slow and wasteful.

Is there some other quick one-liner, perhaps? Or some clever use of the same loadtxt(), which will produce a hash (or hash-like) object with $O(log(n))$ lookups?

CodePudding user response:

You can try:

d = pd.read_csv('data.txt', header=None, usecols=[1, 2, 3, 7, 11],
                index_col=[0, 1, 2, 3], sep=' ')[11].to_dict()

Some information:

  • header=None: consider first line as data and not column names
  • usecols=[1, 2, 3, 7, 11]: select only useful columns
  • index_col=[0, 1, 2, 3]: define columns 1, 2, 3, 7 as index (key)
  • sep=' ': it's the default separator of loadtxt
  • [11]: extract the value column
  • .to_dict(): to convert your series as dict

CodePudding user response:

Python has this ability built-in:

import csv
read_data =[]
with open('data.txt', 'r') as fin:
  reader = csv.DictReader(sep=' ')
  for line in reader:
    read_data.append([line[1], line[2], line[3], line[7], line[11]])
  • Related