Home > Blockchain >  CSV to Dictionary with Column headers as Keys in C#
CSV to Dictionary with Column headers as Keys in C#

Time:06-19

I would like to read in a CSV into a dictionary, however most examples I see have the first column as keys. I would like the field names to be keys.

CSV File:

Name,Type,Classification
file1.txt,text,Secondary
file2.txt,text,Primary

Output dictionary (what I would like):

dict = {
     Name: [file1.txt, file2.txt],
     Type: [text, text],
     Classification: [Secondary, Primary]
}

I'm not sure if I can extend this somehow:

    private void LoadCSV()
    {
        var mydict = new Dictionary<string, List<string>>();
        var lines = File.ReadAllLines(@"C:\test.csv");

        string[] columnHeaders = lines[0].Split(',');
        foreach (string columnHeader in columnHeaders)
        {
            mydict[columnHeader] = new List<string>();
        }
    }

Edit:

This may be doing what I want, but perhaps not very efficient:

private void LoadCSV()
{
    var mydict = new Dictionary<string, List<string>>();
    var lines = File.ReadAllLines(@"C:\test.csv");
    string[] columnHeaders = lines[0].Split(',');
    string[] allLines = lines.Skip(1).ToArray();

    int count = 0;
    foreach (string headerPart in columnHeaders)
    {
        var valuelist = new List<string>();
        foreach (string line in allLines)
        {
            List<string> l = line.Split(',').ToList();
            var element = l.ElementAt(count);
            valuelist.Add(element);
        }
        mydict[headerPart] = valuelist;
        count  ;

    }
}

CodePudding user response:

Good start! Now that you have column headers, you can build the dictionary. Obviously, you need to keep track of matching column numbers to column names

string[] columnHeaders = lines[0].Split(',');

// do this for each line in the file
string[] columnRows = lines.Split(',');

for (int n = 0; n  ; n <= columnRows.Length)
{
    mydict[columnHeaders[n]] = mydict[columnHeaders[n]].Add(columnRows[n]);
}

this way you add a new value to the List that is value of the Dictionary at given index

CodePudding user response:

As an alternative to the imperative style you are using, here is a way to perform this using Linq:

var csvPath = @"C:\test.csv";
var separator = ",";

var lines = File.ReadLines(csvPath);

var indexedKeys = lines
    .First()
    .Split(separator, StringSplitOptions.None)
    .Select((k, i) => 
    (
        Index: i,
        Key: k
    ));

var values = lines
    .Skip(1)
    .Select(l => l.Split(separator, StringSplitOptions.None))
    .ToList();

var result = indexedKeys
    .ToDictionary(
        ik => ik.Key,
        ik => values
            .Select(v => v[ik.Index])
            .ToList());

As for your solution efficiency (the code below Edit in your post), with a basic analysis, there are two potential improvements:

File.ReadAllLines unnecessaryly put your whole file in memory. Using File.ReadLine solves that. See this and this for discussion.

You are applying the .Split() method not once on each line, but as many time as there are columns. To fix this, your variable allines should already contains the splited lines. The solution I propose does that, see the variable values.

  • Related