Home > Net >  Fetching a row from one csv file to search for a matching row in a different csv file
Fetching a row from one csv file to search for a matching row in a different csv file

Time:05-25

I have stumbled upon a problem where i have a csv file generated with the help of CSVhelper. The file consists of 2 columns (Timestamp and ID). I would now like to open a different CSV file that contains IDs and descriptions for these IDs and import the description to the other CSVfile to match the ID with a description.

I have not however found any information on how to perform this. And wondered if there is any smooth way of performing this.

EDIT:

To clarify: I have one CSV file which contains different events which looks similar to this:

17/05/2022 16.28, 2 
17/05/2022 16.28, 3 

Where the 1,2,3 and 4 represents the ID of the event that has happened. I would like to fetch a description for the ID from a different CSV file. Which looks like this:

1, problem 1
2, problem 2
3, problem 3
4, problem 4

So I would like to fetch the description for the ID 2 and 3 and insert it next to the ID so the end result would look like this:

17/05/2022 16.28, 2, problem 2
17/05/2022 16.28, 3, problem 3

CodePudding user response:

@Peter Smith has the right idea. Something like this should work.

void Main()
{
    List<Foo> records;

    var config = new CsvConfiguration(CultureInfo.InvariantCulture)
    {
        HasHeaderRecord = false
    };

    using (var reader = new StreamReader("path\\to\\file.csv"))
    using (var csv = new CsvReader(reader, config))
    {
        var options = new TypeConverterOptions { Formats = new[] { "dd/MM/yyyy mm.ss" } };

        csv.Context.TypeConverterOptionsCache.AddOptions<DateTime>(options);
        
        csv.Context.RegisterClassMap<FooMap>();

        records = csv.GetRecords<Foo>().ToList();
    }

    using (var reader = new StreamReader("path\\to\\descriptions.csv"))
    using (var csv = new CsvReader(reader, config))
    {
        var descriptions = csv.GetRecords<dynamic>().ToList();

        records = records.Join(
            descriptions, 
            record => record.Id, 
            description => description.Field1, 
            (record, description) => { record.Description = description.Field2; return record; }).ToList();
    }
    
    using (var writer = new StreamWriter("path\\to\\file.csv"))
    using (var csv = new CsvWriter(writer, config))
    {
        var options = new TypeConverterOptions { Formats = new[] { "dd/MM/yyyy mm.ss" } };

        csv.Context.TypeConverterOptionsCache.AddOptions<DateTime>(options);
        
        csv.WriteRecords(records);
    }
}

public class FooMap : ClassMap<Foo>
{
    public FooMap() 
    {
        Map(x => x.Id).Index(1);
        Map(x => x.Timestamp).Index(0);
    }
}

public class Foo
{
    [Index(1)]
    public string Id { get; set; }
    [Index(0)]
    public DateTime Timestamp { get; set; }
    [Index(2)]
    public string Description { get; set; }
}

CodePudding user response:

As mentioned by Peter Smith, you can use a left outer join to match your input CSV rows with descriptions from your description CSV via ID.

First, define the following simple record struct:

public readonly record struct Data(string Timestamp, int ID, string Description);

Next, create the following three class maps for your input, description and output CSV files defining the indices for each column:

public sealed class InputMap : ClassMap<Data>
{
    public InputMap()
    {
        Map(m => m.Timestamp).Index(0);
        Map(m => m.ID).Index(1);
    }
}   

public sealed class OutputMap : ClassMap<Data>
{
    public OutputMap()
    {
        Map(m => m.Timestamp).Index(0);
        Map(m => m.ID).Index(1);
        Map(m => m.Description).Index(2);
    }
}   

public sealed class DescriptionMap : ClassMap<Data>
{
    public DescriptionMap()
    {
        Map(m => m.ID).Index(0);
        Map(m => m.Description).Index(1);
    }
}   

And now you can join the inputs and outputs as follows:

var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
    Mode  = CsvMode.Escape, // Prevents adding of quotes around the Description on output.
    HasHeaderRecord = false,
};

using var inputReader = new StreamReader(inputFile);
using var inputCsv = new CsvReader(inputReader, config);
inputCsv.Context.RegisterClassMap<InputMap>();
var inputs = inputCsv.GetRecords<Data>();

using var descriptionReader = new StreamReader(descriptionFile);
using var descriptionCsv = new CsvReader(descriptionReader, config);
descriptionCsv.Context.RegisterClassMap<DescriptionMap>();
var descriptions = descriptionCsv.GetRecords<Data>();

using (var writer = new StreamWriter(outputFile))
using (var outputCsv = new CsvWriter(writer, config))
{
    outputCsv.Context.RegisterClassMap<OutputMap>();
    
    var query = 
        from input in inputs
        join d in descriptions on input.ID equals d.ID into g
        select input with { Description = g.SingleOrDefault().Description }; // Throw an exception if there are multiple descriptions for the same ID.
    
    outputCsv.WriteRecords(query);
}

Notes:

  • CsvReader.GetRecords<T>() is lazy so we don't want to close the input files until we have evaluated the query and written the outputs.

  • We use a left outer join so that input rows with missing descriptions will not get skipped.

  • It's not clear what you want to do in the event there are multiple descriptions for the same ID. The code above will throw an exception via the call to g.SingleOrDefault(). Alternatively you could concatenate the descriptions, or emit multiple rows.

  • Since you're just combining results from two different files, I didn't bother to parse the Timestamp into a proper DateTime.

Example fiddle here.

  • Related