Home > Back-end >  How to use ChoETL to compare two CSV files for ADD, CHANGED or DELETED records (Master vs Detail)?
How to use ChoETL to compare two CSV files for ADD, CHANGED or DELETED records (Master vs Detail)?

Time:11-28

I've been playing with @Chintoo's fantastic ETL system for C#. I need to compare two CSV files, where one CSV file is defined as a dynamically growing master table and the other is a feeder "detail" table.

The detail table may have differences in terms of NEW records, CHANGED records, or a record no longer (DELETED) existing in the master CSV file.

The output should be a 3rd table that replaces or updates the master table - so it's a growing CSV file.

Both tables have unique ID columns and a header row.

MASTER CSV

ID,name
1,Danny
2,Fred
3,Sam

DETAIL

ID,name
1,Danny
          <-- record no longer exists
3,Pamela <-- name change
4,Fernando   <-- new record

So far I've been referring to this fiddle, and the code below:

using System;
using ChoETL;
using System.Linq;

public class Program
{
    public static void Main()
    {
        var input1 = ChoCSVReader.LoadText(csv1).WithFirstLineHeader().ToArray();
        var input2 = ChoCSVReader.LoadText(csv2).WithFirstLineHeader().ToArray();

        Console.WriteLine("NEW records\n");
        using (var output = new ChoCSVWriter(Console.Out).WithFirstLineHeader())
        {
            output.Write(input2.OfType<ChoDynamicObject>().Except(input1.OfType<ChoDynamicObject>(), 
                                                                  new ChoDynamicObjectEqualityComparer(new string[] { "id" })));
        }
        
        Console.WriteLine("\n\nDELETED records\n");
        using (var output = new ChoCSVWriter(Console.Out).WithFirstLineHeader())
        {
            output.Write(input1.OfType<ChoDynamicObject>().Except(input2.OfType<ChoDynamicObject>(), 
                                                                  new ChoDynamicObjectEqualityComparer(new string[] { "id" })));
        }
        
        Console.WriteLine("\n\nCHANGED records\n");
        using (var output = new ChoCSVWriter(Console.Out).WithFirstLineHeader())
        {
            output.Write(input1.OfType<ChoDynamicObject>().Except(input2.OfType<ChoDynamicObject>(), 
                                                                  new ChoDynamicObjectEqualityComparer(new string[] { "id", "name" })));
        }
    }
    
    static string csv1 = @"
ID,name
1,Danny
2,Fred
3,Sam";
    
    static string csv2 = @"
ID,name
1,Danny
3,Pamela
4,Fernando";
}

OUTPUT

NEW records

ID,name
4,Fernando

DELETED records

ID,name
2,Fred

CHANGED records

ID,name
2,Fred
3,Sam

The CHANGED records is not working. As an added extra, I need a status so I want it to look like this:

CHANGED records
    
ID,name,status
1,Danny,NOCHANGE
2,Fred,DELETED
3,Pamela,CHANGED
4,Fernando,NEW

Thanks

CodePudding user response:

Here is how you can do with Cinchoo ETL

            string csv1 = @"ID,name
1,Danny
2,Fred
3,Sam";

            string csv2 = @"ID,name
1,Danny
3,Pamela
4,Fernando";

            var r1 = ChoCSVReader.LoadText(csv1).WithFirstLineHeader().ToArray();
            var r2 = ChoCSVReader.LoadText(csv2).WithFirstLineHeader().ToArray();

            using (var w = new ChoCSVWriter(Console.Out).WithFirstLineHeader())
            {
                var newItems = r2.OfType<ChoDynamicObject>().Except(r1.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "ID" }))
                    .Select(r => 
                    {
                        var dict = r.AsDictionary();
                        dict["Status"] = "NEW"; 
                        return new ChoDynamicObject(dict); 
                    }).ToArray();

                var deletedItems = r1.OfType<ChoDynamicObject>().Except(r2.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "ID" }))
                    .Select(r =>
                    {
                        var dict = r.AsDictionary();
                        dict["Status"] = "DELETED";
                        return new ChoDynamicObject(dict);
                    }).ToArray();

                var changedItems = r2.OfType<ChoDynamicObject>().Except(r1.OfType<ChoDynamicObject>(), ChoDynamicObjectEqualityComparer.Default)
                    .Except(newItems.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "ID" }))
                    .Select(r =>
                    {
                        var dict = r.AsDictionary();
                        dict["Status"] = "CHANGED";
                        return new ChoDynamicObject(dict);
                    }).ToArray();

                var noChangeItems = r1.OfType<ChoDynamicObject>().Intersect(r2.OfType<ChoDynamicObject>(), ChoDynamicObjectEqualityComparer.Default)
                    .Select(r =>
                    {
                        var dict = r.AsDictionary();
                        dict["Status"] = "NOCHANGE";
                        return new ChoDynamicObject(dict);
                    }).ToArray();

                var finalResult = Enumerable.Concat(newItems, deletedItems).Concat(changedItems).Concat(noChangeItems).OfType<dynamic>().OrderBy(r => r.ID);
                w.Write(finalResult);
            }

            Console.WriteLine();

Output:

ID,name,Status
1,Danny,NOCHANGE
2,Fred,DELETED
3,Pamela,CHANGED
4,Fernando,NEW

Sample fiddle: https://dotnetfiddle.net/mrHpFx

  • Related