Home > Blockchain >  Select Row of CSV File
Select Row of CSV File

Time:09-01

I have a CSV File with these headers:

date;clock;value

My aim is to select the CSV line with a specific date to get the corresponding value.

For example: I want to select date 20.08.22 and the result should be 130

15.08.22;07:05;100
20.08.22;08:04;130
21.08.22;10:04;150

With this code snippet I read the lines of the csv file:

private void Werte_aus_CSV_auslesen()
{
    var path = @"E:\werte.csv"; 

    using (TextFieldParser csvParser = new TextFieldParser(path))
    {
        csvParser.CommentTokens = new string[] { "#" };
        csvParser.SetDelimiters(new string[] { ";" });
        csvParser.HasFieldsEnclosedInQuotes = true;

        // Skip the row with the column names
        csvParser.ReadLine();

        while (!csvParser.EndOfData)
        {
            // Read current line fields, pointer moves to the next line.
            fields = csvParser.ReadFields();                    

            datum.Add(fields[0]);
            uhrzeit.Add(fields[1]);
            wert.Add(double.Parse(fields[2],  CultureInfo.InvariantCulture));
        }
    }
}

CodePudding user response:

The approach you are using is going to have to scan the entire CSV every time you lookup a value. This might be a performance problem if this method is called multiple times. It would be better to build a dictionary that maps the date to the value that can be built once and reused for each subsequent lookup.

I maintain a couple libraries that make this pretty easy: Sylvan.Data and Sylvan.Data.Csv. Here is a complete C# 10 console app that demonstrates how to accomplish this:

using Sylvan.Data.Csv;
using Sylvan.Data;

// data: would normally use CsvDataReader.Create(csvFileName, opts);
var data =
    new StringReader(
    @"date;clock;value
15.08.22;07:05;100
20.08.22;08:04;130
21.08.22;10:04;150
");

// parameter:
var selectDate = new DateTime(2022, 8, 20);

// configure settings so the csv reader understands your data
var opts = new CsvDataReaderOptions
{
    DateTimeFormat = "dd'.'MM'.'yy",
    // ignore clock, as it isn't used
    Schema = new CsvSchema(Schema.Parse("date:date,clock,value:int"))
};

var csvReader = CsvDataReader.Create(data, opts);

// create a dictionary to cache the CSV data for quick lookups
// creating the dictionary scans the whole dataset, but subsequent lookups will
// be blazing fast.
{
    var dict =
        csvReader
        .GetRecords<Record>() // bind the CSV data to the Record class
        .ToDictionary(r => r.Date, r => r.Value);

    Console.WriteLine(dict.TryGetValue(selectDate, out var value) ? value.ToString() : "Value not found");
}

class Record
{
    public DateTime Date { get; set; }
    public int Value { get; set; }
}

CodePudding user response:

Matched arrays/lists like datum, uhrzeit, and wert that relate values within each collection based on index is an anti-pattern... something to avoid. So much better to create a class with fields for each of the values, and then have one collection to hold the class.

public class MyData 
{
    public DateTime date {get;set;}
    public int value {get;set;}
}

Newer code might also use a record instead of a class.

We can further improve this by separating the code to read the csv data from the code that composes the objects. Start with something like this:

private IEnumerable<string[]> Werte_aus_CSV_auslesen(string path)
{
    using (TextFieldParser csvParser = new TextFieldParser(path))
    {
        csvParser.CommentTokens = new string[] { "#" };
        csvParser.SetDelimiters(new string[] { ";" });
        csvParser.HasFieldsEnclosedInQuotes = true;

        // Skip the row with the column names
        csvParser.ReadLine();

        while (!csvParser.EndOfData)
        {
            // Read current line fields, pointer moves to the next line.
            yield return csvParser.ReadFields();                    
        }
    }
}

Notice how it accepts an input and returns an object (the enumerable with the data). Also notice how it avoids anything to do with processing the individual rows. It is only concerned with parsing the CSV/SSV inputs.

Now we can add an additional method to transform a string[] into a class instance. I like to start out with this as a static method of the class itself, but as a project grows to have many of these methods they will eventually be moved to their own static type:

public class MyData 
{
    public DateTime date {get;set;}
    public int value {get;set}

    public static MyData FromCSVRow(string[] input)
    {
        return new MyData() {
          date = DateTime.ParseExact($"{input[0]} {input[1]}", "dd.MM.yy HH:mm", null),
          value = int.Parse(input[2])
        };
    }
}

And now finally, with all that out of the way, we can put it all together to get your answer like this:

var targetDate = new DateTime(2022, 8, 20);
var csv = Werte_aus_CSV_auslesen(@"E:\werte.csv");
var rows = csv.Select(MyData.FromCSV);
var result = rows.Where(r => r.date.Date == targetDate);

If we really wanted to, we could even treat all that as a single line of code (it's probably better to keep it separate, for readability/maintainability):

var result = Werte_aus_CSV_auslesen(@"E:\werte.csv").
     Select(MyData.FromCSV).
     Where(r => r.date.Date == new DateTime(2022, 8, 20));

Note result is still an IEnumerable, because there might be more than one row that matches the criteria. If you are really sure there will only be one matching record, you can use this:

var result = rows.Where(r => r.date.Date == targetDate).FirstOrDefault();

or this:

var result = rows.Where(r => r.date.Date == targetDate).First();

depending on what you want to happen if no match is found.

  • Related